Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection
Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt
Standard
Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection. / Yang, Ziheng; Wong, Wendy Shuk Wan; Nielsen, Rasmus.
I: Molecular Biology and Evolution, Bind 22, Nr. 4, 2005, s. 1107-1118.Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection
AU - Yang, Ziheng
AU - Wong, Wendy Shuk Wan
AU - Nielsen, Rasmus
N1 - Key Words: positive selection • codon-substitution models • Bayes empirical Bayes
PY - 2005
Y1 - 2005
N2 - Codon-based substitution models have been widely used to identify amino acid sites under positive selection in comparative analysis of protein-coding DNA sequences. The nonsynonymous-synonymous substitution rate ratio (dN/dS, denoted ) is used as a measure of selective pressure at the protein level, with > 1 indicating positive selection. Statistical distributions are used to model the variation in among sites, allowing a subset of sites to have > 1 while the rest of the sequence may be under purifying selection with < 1. An empirical Bayes (EB) approach is then used to calculate posterior probabilities that a site comes from the site class with > 1. Current implementations, however, use the naive EB (NEB) approach and fail to account for sampling errors in maximum likelihood estimates of model parameters, such as the proportions and ratios for the site classes. In small data sets lacking information, this approach may lead to unreliable posterior probability calculations. In this paper, we develop a Bayes empirical Bayes (BEB) approach to the problem, which assigns a prior to the model parameters and integrates over their uncertainties. We compare the new and old methods on real and simulated data sets. The results suggest that in small data sets the new BEB method does not generate false positives as did the old NEB approach, while in large data sets it retains the good power of the NEB approach for inferring positively selected sites.
AB - Codon-based substitution models have been widely used to identify amino acid sites under positive selection in comparative analysis of protein-coding DNA sequences. The nonsynonymous-synonymous substitution rate ratio (dN/dS, denoted ) is used as a measure of selective pressure at the protein level, with > 1 indicating positive selection. Statistical distributions are used to model the variation in among sites, allowing a subset of sites to have > 1 while the rest of the sequence may be under purifying selection with < 1. An empirical Bayes (EB) approach is then used to calculate posterior probabilities that a site comes from the site class with > 1. Current implementations, however, use the naive EB (NEB) approach and fail to account for sampling errors in maximum likelihood estimates of model parameters, such as the proportions and ratios for the site classes. In small data sets lacking information, this approach may lead to unreliable posterior probability calculations. In this paper, we develop a Bayes empirical Bayes (BEB) approach to the problem, which assigns a prior to the model parameters and integrates over their uncertainties. We compare the new and old methods on real and simulated data sets. The results suggest that in small data sets the new BEB method does not generate false positives as did the old NEB approach, while in large data sets it retains the good power of the NEB approach for inferring positively selected sites.
U2 - 10.1093/molbev/msi097
DO - 10.1093/molbev/msi097
M3 - Journal article
C2 - 15689528
VL - 22
SP - 1107
EP - 1118
JO - Molecular Biology and Evolution
JF - Molecular Biology and Evolution
SN - 0737-4038
IS - 4
ER -
ID: 87244