Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise. / Peona, Valentina; Blom, Mozes P. K.; Xu, Luohao; Burri, Reto; Sullivan, Shawn; Bunikis, Ignas; Liachko, Ivan; Haryoko, Tri; Jonsson, Knud A.; Zhou, Qi; Irestedt, Martin; Suh, Alexander.

I: Molecular Ecology Resources, Bind 21, Nr. 1, 2021, s. 263-286.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

Peona, V, Blom, MPK, Xu, L, Burri, R, Sullivan, S, Bunikis, I, Liachko, I, Haryoko, T, Jonsson, KA, Zhou, Q, Irestedt, M & Suh, A 2021, 'Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise', Molecular Ecology Resources, bind 21, nr. 1, s. 263-286. https://doi.org/10.1111/1755-0998.13252

APA

Peona, V., Blom, M. P. K., Xu, L., Burri, R., Sullivan, S., Bunikis, I., Liachko, I., Haryoko, T., Jonsson, K. A., Zhou, Q., Irestedt, M., & Suh, A. (2021). Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise. Molecular Ecology Resources, 21(1), 263-286. https://doi.org/10.1111/1755-0998.13252

Vancouver

Peona V, Blom MPK, Xu L, Burri R, Sullivan S, Bunikis I o.a. Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise. Molecular Ecology Resources. 2021;21(1):263-286. https://doi.org/10.1111/1755-0998.13252

Author

Peona, Valentina ; Blom, Mozes P. K. ; Xu, Luohao ; Burri, Reto ; Sullivan, Shawn ; Bunikis, Ignas ; Liachko, Ivan ; Haryoko, Tri ; Jonsson, Knud A. ; Zhou, Qi ; Irestedt, Martin ; Suh, Alexander. / Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise. I: Molecular Ecology Resources. 2021 ; Bind 21, Nr. 1. s. 263-286.

Bibtex

@article{3f0e21876cfc42889c6fd4bf258217d1,
title = "Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise",
abstract = "Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat-rich and GC-rich regions (genomic {"}dark matter{"}) limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long-read, linked-read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC-rich microchromosomes and the repeat-rich W chromosome. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.",
keywords = "chromosome-level assembly, GC content, genome assembly, Hi-C, long reads, satellite repeat, transposable element, TRANSPOSABLE ELEMENTS, LIBRARY PREPARATION, HIDDEN GENES, LONG-READ, IN-VITRO, G4 DNA, NOVO, ANNOTATION, EVOLUTION, RNA",
author = "Valentina Peona and Blom, {Mozes P. K.} and Luohao Xu and Reto Burri and Shawn Sullivan and Ignas Bunikis and Ivan Liachko and Tri Haryoko and Jonsson, {Knud A.} and Qi Zhou and Martin Irestedt and Alexander Suh",
year = "2021",
doi = "10.1111/1755-0998.13252",
language = "English",
volume = "21",
pages = "263--286",
journal = "Molecular Ecology",
issn = "0962-1083",
publisher = "Wiley-Blackwell",
number = "1",

}

RIS

TY - JOUR

T1 - Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise

AU - Peona, Valentina

AU - Blom, Mozes P. K.

AU - Xu, Luohao

AU - Burri, Reto

AU - Sullivan, Shawn

AU - Bunikis, Ignas

AU - Liachko, Ivan

AU - Haryoko, Tri

AU - Jonsson, Knud A.

AU - Zhou, Qi

AU - Irestedt, Martin

AU - Suh, Alexander

PY - 2021

Y1 - 2021

N2 - Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat-rich and GC-rich regions (genomic "dark matter") limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long-read, linked-read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC-rich microchromosomes and the repeat-rich W chromosome. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.

AB - Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat-rich and GC-rich regions (genomic "dark matter") limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long-read, linked-read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC-rich microchromosomes and the repeat-rich W chromosome. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.

KW - chromosome-level assembly

KW - GC content

KW - genome assembly

KW - Hi-C

KW - long reads

KW - satellite repeat

KW - transposable element

KW - TRANSPOSABLE ELEMENTS

KW - LIBRARY PREPARATION

KW - HIDDEN GENES

KW - LONG-READ

KW - IN-VITRO

KW - G4 DNA

KW - NOVO

KW - ANNOTATION

KW - EVOLUTION

KW - RNA

U2 - 10.1111/1755-0998.13252

DO - 10.1111/1755-0998.13252

M3 - Journal article

C2 - 32937018

VL - 21

SP - 263

EP - 286

JO - Molecular Ecology

JF - Molecular Ecology

SN - 0962-1083

IS - 1

ER -

ID: 250540911