A reference-free approach to analyse RADseq data using standard next generation sequencing toolkits

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

A reference-free approach to analyse RADseq data using standard next generation sequencing toolkits. / Heller, Rasmus; Nursyifa, Casia; Garcia-Erill, Genís; Salmona, Jordi; Chikhi, Lounes; Meisner, Jonas; Korneliussen, Thorfinn Sand; Albrechtsen, Anders.

I: Molecular Ecology Resources, Bind 21, Nr. 4, 2021, s. 1085-1097.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

Heller, R, Nursyifa, C, Garcia-Erill, G, Salmona, J, Chikhi, L, Meisner, J, Korneliussen, TS & Albrechtsen, A 2021, 'A reference-free approach to analyse RADseq data using standard next generation sequencing toolkits', Molecular Ecology Resources, bind 21, nr. 4, s. 1085-1097. https://doi.org/10.1111/1755-0998.13324

APA

Heller, R., Nursyifa, C., Garcia-Erill, G., Salmona, J., Chikhi, L., Meisner, J., Korneliussen, T. S., & Albrechtsen, A. (2021). A reference-free approach to analyse RADseq data using standard next generation sequencing toolkits. Molecular Ecology Resources, 21(4), 1085-1097. https://doi.org/10.1111/1755-0998.13324

Vancouver

Heller R, Nursyifa C, Garcia-Erill G, Salmona J, Chikhi L, Meisner J o.a. A reference-free approach to analyse RADseq data using standard next generation sequencing toolkits. Molecular Ecology Resources. 2021;21(4):1085-1097. https://doi.org/10.1111/1755-0998.13324

Author

Heller, Rasmus ; Nursyifa, Casia ; Garcia-Erill, Genís ; Salmona, Jordi ; Chikhi, Lounes ; Meisner, Jonas ; Korneliussen, Thorfinn Sand ; Albrechtsen, Anders. / A reference-free approach to analyse RADseq data using standard next generation sequencing toolkits. I: Molecular Ecology Resources. 2021 ; Bind 21, Nr. 4. s. 1085-1097.

Bibtex

@article{cbe490e742914e0daf4595f89a71b90b,
title = "A reference-free approach to analyse RADseq data using standard next generation sequencing toolkits",
abstract = "Genotyping-by-sequencing methods such as RADseq are popular for generating genomic and population-scale data sets from a diverse range of organisms. These often lack a usable reference genome, restricting users to RADseq specific software for processing. However, these come with limitations compared to generic next generation sequencing (NGS) toolkits. Here, we describe and test a simple pipeline for reference-free RADseq data processing that blends de novo elements from STACKS with the full suite of state-of-the art NGS tools. Specifically, we use the de novo RADseq assembly employed by STACKS to create a catalogue of RAD loci that serves as a reference for read mapping, variant calling and site filters. Using RADseq data from 28 zebra sequenced to ~8x depth-of-coverage we evaluate our approach by comparing the site frequency spectra (SFS) to those from alternative pipelines. Most pipelines yielded similar SFS at 8x depth, but only a genotype likelihood based pipeline performed similarly at low sequencing depth (2–4x). We compared the RADseq SFS with medium-depth (~13x) shotgun sequencing of eight overlapping samples, revealing that the RADseq SFS was persistently slightly skewed towards rare and invariant alleles. Using simulations and human data we confirm that this is expected when there is allelic dropout (AD) in the RADseq data. AD in the RADseq data caused a heterozygosity deficit of ~16%, which dropped to ~5% after filtering AD. Hence, AD was the most important source of bias in our RADseq data.",
keywords = "allelic dropout, genetic diversity, genotype calling, genotype likelihood, RADseq, site frequency spectrum",
author = "Rasmus Heller and Casia Nursyifa and Gen{\'i}s Garcia-Erill and Jordi Salmona and Lounes Chikhi and Jonas Meisner and Korneliussen, {Thorfinn Sand} and Anders Albrechtsen",
year = "2021",
doi = "10.1111/1755-0998.13324",
language = "English",
volume = "21",
pages = "1085--1097",
journal = "Molecular Ecology",
issn = "0962-1083",
publisher = "Wiley-Blackwell",
number = "4",

}

RIS

TY - JOUR

T1 - A reference-free approach to analyse RADseq data using standard next generation sequencing toolkits

AU - Heller, Rasmus

AU - Nursyifa, Casia

AU - Garcia-Erill, Genís

AU - Salmona, Jordi

AU - Chikhi, Lounes

AU - Meisner, Jonas

AU - Korneliussen, Thorfinn Sand

AU - Albrechtsen, Anders

PY - 2021

Y1 - 2021

N2 - Genotyping-by-sequencing methods such as RADseq are popular for generating genomic and population-scale data sets from a diverse range of organisms. These often lack a usable reference genome, restricting users to RADseq specific software for processing. However, these come with limitations compared to generic next generation sequencing (NGS) toolkits. Here, we describe and test a simple pipeline for reference-free RADseq data processing that blends de novo elements from STACKS with the full suite of state-of-the art NGS tools. Specifically, we use the de novo RADseq assembly employed by STACKS to create a catalogue of RAD loci that serves as a reference for read mapping, variant calling and site filters. Using RADseq data from 28 zebra sequenced to ~8x depth-of-coverage we evaluate our approach by comparing the site frequency spectra (SFS) to those from alternative pipelines. Most pipelines yielded similar SFS at 8x depth, but only a genotype likelihood based pipeline performed similarly at low sequencing depth (2–4x). We compared the RADseq SFS with medium-depth (~13x) shotgun sequencing of eight overlapping samples, revealing that the RADseq SFS was persistently slightly skewed towards rare and invariant alleles. Using simulations and human data we confirm that this is expected when there is allelic dropout (AD) in the RADseq data. AD in the RADseq data caused a heterozygosity deficit of ~16%, which dropped to ~5% after filtering AD. Hence, AD was the most important source of bias in our RADseq data.

AB - Genotyping-by-sequencing methods such as RADseq are popular for generating genomic and population-scale data sets from a diverse range of organisms. These often lack a usable reference genome, restricting users to RADseq specific software for processing. However, these come with limitations compared to generic next generation sequencing (NGS) toolkits. Here, we describe and test a simple pipeline for reference-free RADseq data processing that blends de novo elements from STACKS with the full suite of state-of-the art NGS tools. Specifically, we use the de novo RADseq assembly employed by STACKS to create a catalogue of RAD loci that serves as a reference for read mapping, variant calling and site filters. Using RADseq data from 28 zebra sequenced to ~8x depth-of-coverage we evaluate our approach by comparing the site frequency spectra (SFS) to those from alternative pipelines. Most pipelines yielded similar SFS at 8x depth, but only a genotype likelihood based pipeline performed similarly at low sequencing depth (2–4x). We compared the RADseq SFS with medium-depth (~13x) shotgun sequencing of eight overlapping samples, revealing that the RADseq SFS was persistently slightly skewed towards rare and invariant alleles. Using simulations and human data we confirm that this is expected when there is allelic dropout (AD) in the RADseq data. AD in the RADseq data caused a heterozygosity deficit of ~16%, which dropped to ~5% after filtering AD. Hence, AD was the most important source of bias in our RADseq data.

KW - allelic dropout

KW - genetic diversity

KW - genotype calling

KW - genotype likelihood

KW - RADseq

KW - site frequency spectrum

U2 - 10.1111/1755-0998.13324

DO - 10.1111/1755-0998.13324

M3 - Journal article

C2 - 33434329

AN - SCOPUS:85100595569

VL - 21

SP - 1085

EP - 1097

JO - Molecular Ecology

JF - Molecular Ecology

SN - 0962-1083

IS - 4

ER -

ID: 257540595