Benchmarking of methods for genomic taxonomy

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Benchmarking of methods for genomic taxonomy. / Larsen, Mette V; Cosentino, Salvatore; Lukjancenko, Oksana; Saputra, Dhany; Rasmussen, Simon; Hasman, Henrik; Sicheritz-Pontén, Thomas; Aarestrup, Frank M; Ussery, David W; Lund, Ole.

In: Journal of Clinical Microbiology, Vol. 52, No. 5, 2014, p. 1529-39.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Larsen, MV, Cosentino, S, Lukjancenko, O, Saputra, D, Rasmussen, S, Hasman, H, Sicheritz-Pontén, T, Aarestrup, FM, Ussery, DW & Lund, O 2014, 'Benchmarking of methods for genomic taxonomy', Journal of Clinical Microbiology, vol. 52, no. 5, pp. 1529-39. https://doi.org/10.1128/JCM.02981-13

APA

Larsen, M. V., Cosentino, S., Lukjancenko, O., Saputra, D., Rasmussen, S., Hasman, H., Sicheritz-Pontén, T., Aarestrup, F. M., Ussery, D. W., & Lund, O. (2014). Benchmarking of methods for genomic taxonomy. Journal of Clinical Microbiology, 52(5), 1529-39. https://doi.org/10.1128/JCM.02981-13

Vancouver

Larsen MV, Cosentino S, Lukjancenko O, Saputra D, Rasmussen S, Hasman H et al. Benchmarking of methods for genomic taxonomy. Journal of Clinical Microbiology. 2014;52(5):1529-39. https://doi.org/10.1128/JCM.02981-13

Author

Larsen, Mette V ; Cosentino, Salvatore ; Lukjancenko, Oksana ; Saputra, Dhany ; Rasmussen, Simon ; Hasman, Henrik ; Sicheritz-Pontén, Thomas ; Aarestrup, Frank M ; Ussery, David W ; Lund, Ole. / Benchmarking of methods for genomic taxonomy. In: Journal of Clinical Microbiology. 2014 ; Vol. 52, No. 5. pp. 1529-39.

Bibtex

@article{1d6bbf75ba574e95bd4ae5046320711f,
title = "Benchmarking of methods for genomic taxonomy",
abstract = "One of the first issues that emerges when a prokaryotic organism of interest is encountered is the question of what it is--that is, which species it is. The 16S rRNA gene formed the basis of the first method for sequence-based taxonomy and has had a tremendous impact on the field of microbiology. Nevertheless, the method has been found to have a number of shortcomings. In the current study, we trained and benchmarked five methods for whole-genome sequence-based prokaryotic species identification on a common data set of complete genomes: (i) SpeciesFinder, which is based on the complete 16S rRNA gene; (ii) Reads2Type that searches for species-specific 50-mers in either the 16S rRNA gene or the gyrB gene (for the Enterobacteraceae family); (iii) the ribosomal multilocus sequence typing (rMLST) method that samples up to 53 ribosomal genes; (iv) TaxonomyFinder, which is based on species-specific functional protein domain profiles; and finally (v) KmerFinder, which examines the number of cooccurring k-mers (substrings of k nucleotides in DNA sequence data). The performances of the methods were subsequently evaluated on three data sets of short sequence reads or draft genomes from public databases. In total, the evaluation sets constituted sequence data from more than 11,000 isolates covering 159 genera and 243 species. Our results indicate that methods that sample only chromosomal, core genes have difficulties in distinguishing closely related species which only recently diverged. The KmerFinder method had the overall highest accuracy and correctly identified from 93% to 97% of the isolates in the evaluations sets.",
keywords = "Archaea/genetics, Bacteria/genetics, Bacterial Proteins/genetics, Benchmarking/methods, Classification/methods, DNA, Bacterial/genetics, Genomics/methods, Multilocus Sequence Typing/methods, RNA, Ribosomal, 16S/genetics",
author = "Larsen, {Mette V} and Salvatore Cosentino and Oksana Lukjancenko and Dhany Saputra and Simon Rasmussen and Henrik Hasman and Thomas Sicheritz-Pont{\'e}n and Aarestrup, {Frank M} and Ussery, {David W} and Ole Lund",
year = "2014",
doi = "10.1128/JCM.02981-13",
language = "English",
volume = "52",
pages = "1529--39",
journal = "Journal of Clinical Microbiology",
issn = "0095-1137",
publisher = "American Society for Microbiology",
number = "5",

}

RIS

TY - JOUR

T1 - Benchmarking of methods for genomic taxonomy

AU - Larsen, Mette V

AU - Cosentino, Salvatore

AU - Lukjancenko, Oksana

AU - Saputra, Dhany

AU - Rasmussen, Simon

AU - Hasman, Henrik

AU - Sicheritz-Pontén, Thomas

AU - Aarestrup, Frank M

AU - Ussery, David W

AU - Lund, Ole

PY - 2014

Y1 - 2014

N2 - One of the first issues that emerges when a prokaryotic organism of interest is encountered is the question of what it is--that is, which species it is. The 16S rRNA gene formed the basis of the first method for sequence-based taxonomy and has had a tremendous impact on the field of microbiology. Nevertheless, the method has been found to have a number of shortcomings. In the current study, we trained and benchmarked five methods for whole-genome sequence-based prokaryotic species identification on a common data set of complete genomes: (i) SpeciesFinder, which is based on the complete 16S rRNA gene; (ii) Reads2Type that searches for species-specific 50-mers in either the 16S rRNA gene or the gyrB gene (for the Enterobacteraceae family); (iii) the ribosomal multilocus sequence typing (rMLST) method that samples up to 53 ribosomal genes; (iv) TaxonomyFinder, which is based on species-specific functional protein domain profiles; and finally (v) KmerFinder, which examines the number of cooccurring k-mers (substrings of k nucleotides in DNA sequence data). The performances of the methods were subsequently evaluated on three data sets of short sequence reads or draft genomes from public databases. In total, the evaluation sets constituted sequence data from more than 11,000 isolates covering 159 genera and 243 species. Our results indicate that methods that sample only chromosomal, core genes have difficulties in distinguishing closely related species which only recently diverged. The KmerFinder method had the overall highest accuracy and correctly identified from 93% to 97% of the isolates in the evaluations sets.

AB - One of the first issues that emerges when a prokaryotic organism of interest is encountered is the question of what it is--that is, which species it is. The 16S rRNA gene formed the basis of the first method for sequence-based taxonomy and has had a tremendous impact on the field of microbiology. Nevertheless, the method has been found to have a number of shortcomings. In the current study, we trained and benchmarked five methods for whole-genome sequence-based prokaryotic species identification on a common data set of complete genomes: (i) SpeciesFinder, which is based on the complete 16S rRNA gene; (ii) Reads2Type that searches for species-specific 50-mers in either the 16S rRNA gene or the gyrB gene (for the Enterobacteraceae family); (iii) the ribosomal multilocus sequence typing (rMLST) method that samples up to 53 ribosomal genes; (iv) TaxonomyFinder, which is based on species-specific functional protein domain profiles; and finally (v) KmerFinder, which examines the number of cooccurring k-mers (substrings of k nucleotides in DNA sequence data). The performances of the methods were subsequently evaluated on three data sets of short sequence reads or draft genomes from public databases. In total, the evaluation sets constituted sequence data from more than 11,000 isolates covering 159 genera and 243 species. Our results indicate that methods that sample only chromosomal, core genes have difficulties in distinguishing closely related species which only recently diverged. The KmerFinder method had the overall highest accuracy and correctly identified from 93% to 97% of the isolates in the evaluations sets.

KW - Archaea/genetics

KW - Bacteria/genetics

KW - Bacterial Proteins/genetics

KW - Benchmarking/methods

KW - Classification/methods

KW - DNA, Bacterial/genetics

KW - Genomics/methods

KW - Multilocus Sequence Typing/methods

KW - RNA, Ribosomal, 16S/genetics

U2 - 10.1128/JCM.02981-13

DO - 10.1128/JCM.02981-13

M3 - Journal article

C2 - 24574292

VL - 52

SP - 1529

EP - 1539

JO - Journal of Clinical Microbiology

JF - Journal of Clinical Microbiology

SN - 0095-1137

IS - 5

ER -

ID: 214515824