Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise
Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt
Standard
Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise. / Peona, Valentina; Blom, Mozes P. K.; Xu, Luohao; Burri, Reto; Sullivan, Shawn; Bunikis, Ignas; Liachko, Ivan; Haryoko, Tri; Jonsson, Knud A.; Zhou, Qi; Irestedt, Martin; Suh, Alexander.
I: Molecular Ecology Resources, Bind 21, Nr. 1, 2021, s. 263-286.Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise
AU - Peona, Valentina
AU - Blom, Mozes P. K.
AU - Xu, Luohao
AU - Burri, Reto
AU - Sullivan, Shawn
AU - Bunikis, Ignas
AU - Liachko, Ivan
AU - Haryoko, Tri
AU - Jonsson, Knud A.
AU - Zhou, Qi
AU - Irestedt, Martin
AU - Suh, Alexander
PY - 2021
Y1 - 2021
N2 - Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat-rich and GC-rich regions (genomic "dark matter") limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long-read, linked-read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC-rich microchromosomes and the repeat-rich W chromosome. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.
AB - Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat-rich and GC-rich regions (genomic "dark matter") limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long-read, linked-read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC-rich microchromosomes and the repeat-rich W chromosome. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.
KW - chromosome-level assembly
KW - GC content
KW - genome assembly
KW - Hi-C
KW - long reads
KW - satellite repeat
KW - transposable element
KW - TRANSPOSABLE ELEMENTS
KW - LIBRARY PREPARATION
KW - HIDDEN GENES
KW - LONG-READ
KW - IN-VITRO
KW - G4 DNA
KW - NOVO
KW - ANNOTATION
KW - EVOLUTION
KW - RNA
U2 - 10.1111/1755-0998.13252
DO - 10.1111/1755-0998.13252
M3 - Journal article
C2 - 32937018
VL - 21
SP - 263
EP - 286
JO - Molecular Ecology
JF - Molecular Ecology
SN - 0962-1083
IS - 1
ER -
ID: 250540911