Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites

被引:155
|
作者
Anisimova, Maria [1 ]
Yang, Ziheng
机构
[1] UCL, Dept Biol, London, England
[2] UCL, Ctr Math Phys Life Sci & Expt Biol, London, England
关键词
multiple hypothesis testing; family-wise error rate (FWER); false discovery rate (FDR); positive selection; branch-site model; molecular adaptation;
D O I
10.1093/molbev/msm042
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Detection of positive Darwinian selection has become ever more important with the rapid growth of genomic data sets. Recent branch-site models of codon substitution account for variation of selective pressure over branches on the tree and across sites in the sequence and provide a means to detect short episodes of molecular adaptation affecting just a few sites. In likelihood ratio tests based on such models, the branches to be tested for positive selection have to be specified a priori. In the absence of a biological hypothesis to designate so-called foreground branches, one may test many branches, but a correction for multiple testing becomes necessary. In this paper, we employ computer simulation to evaluate the performance of 6 multiple test correction procedures when the branch-site models are used to test every branch on the phylogeny for positive selection. Four of the methods control the familywise error rates (FWERs), whereas the other 2 control the false discovery rate (FDR). We found that all correction procedures achieved acceptable FWER except for extremely divergent sequences and serious model violations, when the test may become unreliable. The power of the test to detect positive selection is influenced by the strength of selection and the sequence divergence, with the highest power observed at intermediate divergences. The 4 correction procedures that control the FWER had similar power. We recommend Rom's procedure for its slightly higher power, but the simple Bonferroni correction is useable as well. The 2 correction procedures that control the FDR had slightly more power and also higher FWER, We demonstrate the multiple test procedures by analyzing gene sequences from the extracellular domain of the cluster of differentiation 2 (CD2) gene from 10 mammalian species. Both our simulation and real data analysis suggest that the Multiple test procedures are useful when multiple branches have to be tested on the same data set.
引用
收藏
页码:1219 / 1228
页数:10
相关论文
共 50 条
  • [31] Comparative analysis of gene sets in the gene ontology space under the multiple hypothesis testing framework
    Zhong, S
    Tian, L
    Li, C
    Storch, KF
    Wong, WH
    2004 IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE, PROCEEDINGS, 2004, : 425 - 435
  • [32] Link Your Sites (LYS) Scripts: Automated Search of Protein Structures and Mapping of Sites Under Positive Selection Detected by PAML
    Moreta, Lys Sanz
    Fonseca, Rute R.
    EVOLUTIONARY BIOLOGY, 2020, 47 (03) : 240 - 245
  • [33] Link Your Sites (LYS) Scripts: Automated Search of Protein Structures and Mapping of Sites Under Positive Selection Detected by PAML
    Lys Sanz Moreta
    Rute R. da Fonseca
    Evolutionary Biology, 2020, 47 : 240 - 245
  • [34] Cereal-induced gender selection? Most likely a multiple testing false positive Reply
    Mathews, F.
    Johnson, P.
    Neil, A.
    PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2009, 276 (1660) : 1213 - 1214
  • [35] Multiple-hypothesis testing rules for high-dimensional model selection and sparse-parameter estimation
    Babu, Prabhu
    Stoica, Petre
    SIGNAL PROCESSING, 2023, 213
  • [36] Asymptotic Properties of Bayes Risk of a General Class of Shrinkage Priors in Multiple Hypothesis Testing Under Sparsity
    Ghosh, Prasenjit
    Tang, Xueying
    Ghosh, Malay
    Chakrabarti, Arijit
    BAYESIAN ANALYSIS, 2016, 11 (03): : 753 - 796
  • [37] The limiting bound of Efron's W-formula for hypothesis testing when a nuisance parameter is present only under the alternative
    Li, Qizhai
    Zheng, Gang
    Liu, Aiyi
    Xiong, Shifeng
    Li, Zhaohai
    Yu, Kai
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2010, 140 (06) : 1610 - 1617
  • [38] Testing the facilitation-competition paradigm under the stress-gradient hypothesis: decoupling multiple stress factors
    Kawai, Takashi
    Tokeshi, Mutsunori
    PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2007, 274 (1624) : 2503 - 2508
  • [39] LD networks, a new approach to detect sites under polygenic selection, applied to characterize patterns of introgression among coral ecomorphs.
    Matz, M., V
    INTEGRATIVE AND COMPARATIVE BIOLOGY, 2020, 60 : E374 - E374
  • [40] Mortality and causes of death among people suspected of driving under the influence and testing positive for multiple substances
    Karjalainen, Karoliina
    Haukka, Jari
    Kuussaari, Kristiina
    Hautala, Sanna
    Hakkarainen, Pekka
    SCANDINAVIAN JOURNAL OF PUBLIC HEALTH, 2020, 48 (08) : 809 - 816