The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

  • Zhiwen Wang
  • Neil Hobson
  • Leonardo Galindo
  • Shilin Zhu
  • Daihu Shi
  • Joshua McDill
  • Linfeng Yang
  • Simon Hawkins
  • Godfrey Neutelings
  • Raju Datla
  • Georgina Lambert
  • David W. Galbraith
  • Christopher J. Grassa
  • Armando Geraldes
  • Quentin C. Cronk
  • Christopher Cullis
  • Prasanta K. Dash
  • Polumetla A. Kumar
  • Sylvie Cloutier
  • Andrew G. Sharpe
  • Og 3 flere
  • Gane K.-S. Wong
  • Jun Wang
  • Michael K. Deyholos
Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94× raw, approximately 69× filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N(50) =694 kb, including contigs with N(50)=20.1 kb. The contig assembly contained 302 Mb of non-redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole-genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis-assembly of regions at the genome scale. A total of 43384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (K(s) ) observed within duplicate gene pairs was consistent with a recent (5-9 MYA) whole-genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam-A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species.
OriginalsprogEngelsk
TidsskriftPlant Journal
Vol/bind72
Udgave nummer3
Sider (fra-til)461-473
Antal sider13
ISSN0960-7412
DOI
StatusUdgivet - 2012

ID: 46232924