Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies. / Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M.; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert.

In: Scientific Reports, Vol. 6, 36671, 2016.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Mieth, B, Kloft, M, Rodríguez, JA, Sonnenburg, S, Vobruba, R, Morcillo-Suárez, C, Farré, X, Marigorta, UM, Fehr, E, Dickhaus, T, Blanchard, G, Schunk, D, Navarro, A & Müller, K-R 2016, 'Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies', Scientific Reports, vol. 6, 36671. https://doi.org/10.1038/srep36671

APA

Mieth, B., Kloft, M., Rodríguez, J. A., Sonnenburg, S., Vobruba, R., Morcillo-Suárez, C., Farré, X., Marigorta, U. M., Fehr, E., Dickhaus, T., Blanchard, G., Schunk, D., Navarro, A., & Müller, K-R. (2016). Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies. Scientific Reports, 6, [36671]. https://doi.org/10.1038/srep36671

Vancouver

Mieth B, Kloft M, Rodríguez JA, Sonnenburg S, Vobruba R, Morcillo-Suárez C et al. Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies. Scientific Reports. 2016;6. 36671. https://doi.org/10.1038/srep36671

Author

Mieth, Bettina ; Kloft, Marius ; Rodríguez, Juan Antonio ; Sonnenburg, Sören ; Vobruba, Robin ; Morcillo-Suárez, Carlos ; Farré, Xavier ; Marigorta, Urko M. ; Fehr, Ernst ; Dickhaus, Thorsten ; Blanchard, Gilles ; Schunk, Daniel ; Navarro, Arcadi ; Müller, Klaus-Robert. / Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies. In: Scientific Reports. 2016 ; Vol. 6.

Bibtex

@article{2e2368a91fb3429cb945207e7855298a,
title = "Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies",
abstract = "The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.",
author = "Bettina Mieth and Marius Kloft and Rodr{\'i}guez, {Juan Antonio} and S{\"o}ren Sonnenburg and Robin Vobruba and Carlos Morcillo-Su{\'a}rez and Xavier Farr{\'e} and Marigorta, {Urko M.} and Ernst Fehr and Thorsten Dickhaus and Gilles Blanchard and Daniel Schunk and Arcadi Navarro and Klaus-Robert M{\"u}ller",
note = "Funding Information: MK and KRM were financially supported by the Ministry of Education, Science, and Technology, through the National Research Foundation of Korea under Grant R31-10008 (MK, KRM) and BK21 (KRM).",
year = "2016",
doi = "10.1038/srep36671",
language = "English",
volume = "6",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "nature publishing group",

}

RIS

TY - JOUR

T1 - Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies

AU - Mieth, Bettina

AU - Kloft, Marius

AU - Rodríguez, Juan Antonio

AU - Sonnenburg, Sören

AU - Vobruba, Robin

AU - Morcillo-Suárez, Carlos

AU - Farré, Xavier

AU - Marigorta, Urko M.

AU - Fehr, Ernst

AU - Dickhaus, Thorsten

AU - Blanchard, Gilles

AU - Schunk, Daniel

AU - Navarro, Arcadi

AU - Müller, Klaus-Robert

N1 - Funding Information: MK and KRM were financially supported by the Ministry of Education, Science, and Technology, through the National Research Foundation of Korea under Grant R31-10008 (MK, KRM) and BK21 (KRM).

PY - 2016

Y1 - 2016

N2 - The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.

AB - The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.

U2 - 10.1038/srep36671

DO - 10.1038/srep36671

M3 - Journal article

AN - SCOPUS:84999740191

VL - 6

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

M1 - 36671

ER -

ID: 327401113