MOCAT: a metagenomics assembly and gene prediction toolkit
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
MOCAT : a metagenomics assembly and gene prediction toolkit. / Kultima, Jens Roat; Sunagawa, Shinichi; Li, Junhua; Chen, Weineng; Chen, Hua; Mende, Daniel R.; Arumugam, Manimozhiyan; Pan, Qi; Liu, Binghang; Qin, Junjie; Wang, Jun; Bork, Peer.
In: PLOS ONE, Vol. 7, No. 10, 2012, p. e47656.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - MOCAT
T2 - a metagenomics assembly and gene prediction toolkit
AU - Kultima, Jens Roat
AU - Sunagawa, Shinichi
AU - Li, Junhua
AU - Chen, Weineng
AU - Chen, Hua
AU - Mende, Daniel R.
AU - Arumugam, Manimozhiyan
AU - Pan, Qi
AU - Liu, Binghang
AU - Qin, Junjie
AU - Wang, Jun
AU - Bork, Peer
PY - 2012
Y1 - 2012
N2 - MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/.
AB - MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/.
KW - Computational Biology
KW - Computer Simulation
KW - Databases, Genetic
KW - Gastrointestinal Tract
KW - Genes
KW - Humans
KW - Metagenome
KW - Metagenomics
KW - Reference Standards
KW - Sequence Analysis, DNA
KW - Software
KW - Statistics as Topic
U2 - 10.1371/journal.pone.0047656
DO - 10.1371/journal.pone.0047656
M3 - Journal article
C2 - 23082188
VL - 7
SP - e47656
JO - PLoS ONE
JF - PLoS ONE
SN - 1932-6203
IS - 10
ER -
ID: 101041674