Fast admixture analysis and population tree estimation for SNP and NGS data
Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt
Standard
Fast admixture analysis and population tree estimation for SNP and NGS data. / Cheng, Jade Yu; Mailund, Thomas; Nielsen, Rasmus.
I: Bioinformatics, Bind 33, Nr. 14, 15.07.2017, s. 2148-2155.Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Fast admixture analysis and population tree estimation for SNP and NGS data
AU - Cheng, Jade Yu
AU - Mailund, Thomas
AU - Nielsen, Rasmus
PY - 2017/7/15
Y1 - 2017/7/15
N2 - Motivation: Structure methods are highly used population genetic methods for classifying individuals in a sample fractionally into discrete ancestry components. Contribution: We introduce a new optimization algorithm for the classical STRUCTURE model in a maximum likelihood framework. Using analyses of real data we show that the new method finds solutions with higher likelihoods than the state-of-the-art method in the same computational time. The optimization algorithm is also applicable to models based on genotype likelihoods, that can account for the uncertainty in genotype-calling associated with Next Generation Sequencing (NGS) data. We also present a new method for estimating population trees from ancestry components using a Gaussian approximation. Using coalescence simulations of diverging populations, we explore the adequacy of the STRUCTURE-style models and the Gaussian assumption for identifying ancestry components correctly and for inferring the correct tree. In most cases, ancestry components are inferred correctly, although sample sizes and times since admixture can influence the results. We show that the popular Gaussian approximation tends to perform poorly under extreme divergence scenarios e.g. with very long branch lengths, but the topologies of the population trees are accurately inferred in all scenarios explored. The new methods are implemented together with appropriate visualization tools in the software package Ohana.
AB - Motivation: Structure methods are highly used population genetic methods for classifying individuals in a sample fractionally into discrete ancestry components. Contribution: We introduce a new optimization algorithm for the classical STRUCTURE model in a maximum likelihood framework. Using analyses of real data we show that the new method finds solutions with higher likelihoods than the state-of-the-art method in the same computational time. The optimization algorithm is also applicable to models based on genotype likelihoods, that can account for the uncertainty in genotype-calling associated with Next Generation Sequencing (NGS) data. We also present a new method for estimating population trees from ancestry components using a Gaussian approximation. Using coalescence simulations of diverging populations, we explore the adequacy of the STRUCTURE-style models and the Gaussian assumption for identifying ancestry components correctly and for inferring the correct tree. In most cases, ancestry components are inferred correctly, although sample sizes and times since admixture can influence the results. We show that the popular Gaussian approximation tends to perform poorly under extreme divergence scenarios e.g. with very long branch lengths, but the topologies of the population trees are accurately inferred in all scenarios explored. The new methods are implemented together with appropriate visualization tools in the software package Ohana.
UR - http://www.scopus.com/inward/record.url?scp=85024488622&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btx098
DO - 10.1093/bioinformatics/btx098
M3 - Journal article
C2 - 28334108
AN - SCOPUS:85024488622
VL - 33
SP - 2148
EP - 2155
JO - Computer Applications in the Biosciences
JF - Computer Applications in the Biosciences
SN - 1471-2105
IS - 14
ER -
ID: 181388792