A counting renaissance: combining stochastic mapping and empirical Bayes to quickly detect amino acid sites under positive selection

被引:57
|
作者
Lemey, Philippe [1 ]
Minin, Vladimir N. [2 ]
Bielejec, Filip [1 ]
Pond, Sergei L. Kosakovsky [3 ]
Suchard, Marc A. [4 ,5 ,6 ]
机构
[1] Katholieke Univ Leuven, Rega Inst, Dept Microbiol & Immunol, B-3000 Louvain, Belgium
[2] Univ Washington, Dept Stat, Seattle, WA 98195 USA
[3] Univ Calif San Diego, Dept Med, San Diego, CA 92103 USA
[4] Univ Calif Los Angeles, David Geffen Sch Med, Dept Biomath, Los Angeles, CA 90095 USA
[5] Univ Calif Los Angeles, David Geffen Sch Med, Dept Human Genet, Los Angeles, CA 90095 USA
[6] Univ Calif Los Angeles, Sch Publ Hlth, Dept Biostat, Los Angeles, CA 90095 USA
基金
美国国家卫生研究院; 美国国家科学基金会; 欧洲研究理事会;
关键词
TYPE-1; REVERSE-TRANSCRIPTASE; HIGH-LEVEL RESISTANCE; NUCLEOTIDE SUBSTITUTION; LIKELIHOOD MODELS; MUTATIONS; ZIDOVUDINE; NUCLEOSIDE; SUSCEPTIBILITY; SIMULATION; EVOLUTION;
D O I
10.1093/bioinformatics/bts580
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
MOTIVATION: Statistical methods for comparing relative rates of synonymous and non-synonymous substitutions maintain a central role in detecting positive selection. To identify selection, researchers often estimate the ratio of these relative rates ( ) at individual alignment sites. Fitting a codon substitution model that captures heterogeneity in across sites provides a reliable way to perform such estimation, but it remains computationally prohibitive for massive datasets. By using crude estimates of the numbers of synonymous and non-synonymous substitutions at each site, counting approaches scale well to large datasets, but they fail to account for ancestral state reconstruction uncertainty and to provide site-specific estimates. RESULTS: We propose a hybrid solution that borrows the computational strength of counting methods, but augments these methods with empirical Bayes modeling to produce a relatively fast and reliable method capable of estimating site-specific values in large datasets. Importantly, our hybrid approach, set in a Bayesian framework, integrates over the posterior distribution of phylogenies and ancestral reconstructions to quantify uncertainty about site-specific estimates. Simulations demonstrate that this method competes well with more-principled statistical procedures and, in some cases, even outperforms them. We illustrate the utility of our method using human immunodeficiency virus, feline panleukopenia and canine parvovirus evolution examples.
引用
收藏
页码:3248 / 3256
页数:9
相关论文
共 5 条
  • [1] Bayes empirical Bayes inference of amino acid sites under positive selection
    Yang, ZH
    Wong, WSW
    Nielsen, R
    MOLECULAR BIOLOGY AND EVOLUTION, 2005, 22 (04) : 1107 - 1118
  • [2] Accuracy and power of Bayes prediction of amino acid sites under positive selection
    Anisimova, M
    Bielawski, JP
    Yang, ZH
    MOLECULAR BIOLOGY AND EVOLUTION, 2002, 19 (06) : 950 - 958
  • [3] Detecting amino acid sites under positive selection and purifying selection
    Massingham, T
    Goldman, N
    GENETICS, 2005, 169 (03) : 1753 - 1762
  • [4] Anopheles Immune Genes and Amino Acid Sites Evolving Under the Effect of Positive Selection
    Parmakelis, Aristeidis
    Moustaka, Marina
    Poulakakis, Nikolaos
    Louis, Christos
    Slotman, Michel A.
    Marshall, Jonathon C.
    Awono-Ambene, Parfait H.
    Antonio-Nkondjio, Christophe
    Simard, Frederic
    Caccone, Adalgisa
    Powell, Jeffrey R.
    PLOS ONE, 2010, 5 (01):
  • [5] Mapping sites of positive selection and amino acid diversification in the HIV genome: An alternative approach to vaccine design?
    de Oliveira, T
    Salemi, M
    Gordon, M
    Vandamme, AM
    van Rensburg, E
    Engelbrecht, S
    Coovadia, HM
    Cassol, S
    GENETICS, 2004, 167 (03) : 1047 - 1058