Investigating the performance of AIC in selecting phylogenetic models

被引:8
|
作者
Jhwueng, Dwueng-Chwuan [3 ]
Huzurbazar, Snehalata [4 ,5 ,6 ]
O'Meara, Brian C. [7 ]
Liu, Liang [1 ,2 ]
机构
[1] Univ Georgia, Dept Stat, Athens, GA 30606 USA
[2] Univ Georgia, Inst Bioinformat, Athens, GA 30606 USA
[3] Feng Chia Univ, Dept Stat, Taichung 40724, Taiwan
[4] Stat & Appl Math Sci Inst, Res Triangle Pk, NC 27709 USA
[5] Univ Wyoming, Dept Stat, Laramie, WY 82071 USA
[6] N Carolina State Univ, Dept Stat, Raleigh, NC 27695 USA
[7] Univ Tennessee, Dept Ecol & Evolutionary Biol, Knoxville, TN 37996 USA
基金
美国国家科学基金会;
关键词
AIC; Kullback-Leibler divergence; model selection; phylogenetics; AKAIKE INFORMATION CRITERION; LIKELIHOOD-RATIO TEST; SUBSTITUTION MODELS; DNA-SEQUENCES; EVOLUTION; JMODELTEST; ACCURATE; TESTS; RATES;
D O I
10.1515/sagmb-2013-0048
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The popular likelihood-based model selection criterion, Akaike's Information Criterion (AIC), is a breakthrough mathematical result derived from information theory. AIC is an approximation to Kullback-Leibler (KL) divergence with the derivation relying on the assumption that the likelihood function has finite second derivatives. However, for phylogenetic estimation, given that tree space is discrete with respect to tree topology, the assumption of a continuous likelihood function with finite second derivatives is violated. In this paper, we investigate the relationship between the expected log likelihood of a candidate model, and the expected KL divergence in the context of phylogenetic tree estimation. We find that given the tree topology, AIC is an unbiased estimator of the expected KL divergence. However, when the tree topology is unknown, AIC tends to underestimate the expected KL divergence for phylogenetic models. Simulation results suggest that the degree of underestimation varies across phylogenetic models so that even for large sample sizes, the bias of AIC can result in selecting a wrong model. As the choice of phylogenetic models is essential for statistical phylogenetic inference, it is important to improve the accuracy of model selection criteria in the context of phylogenetics.
引用
收藏
页码:459 / 475
页数:17
相关论文
共 50 条
  • [1] Evaluating the performance of AIC and BIC for selecting spatial econometric models
    Christos Agiakloglou
    Apostolos Tsimpanos
    [J]. Journal of Spatial Econometrics, 2023, 4 (1):
  • [2] Bias-Corrected AIC for Selecting Variables in Poisson Regression Models
    Kamo, Ken-Ichi
    Yanagihara, Hirokazu
    Satoh, Kenichi
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2013, 42 (11) : 1911 - 1921
  • [3] The performance of restricted AIC for irregular histogram models
    Gokmen, Sahika
    Lyhagen, Johan
    [J]. PLOS ONE, 2024, 19 (05):
  • [4] Bias-corrected AIC for selecting variables in multinomial logistic regression models
    Yanagihara, Hirokazu
    Kamo, Ken-ichi
    Imori, Shinpei
    Satoh, Kenichi
    [J]. LINEAR ALGEBRA AND ITS APPLICATIONS, 2012, 436 (11) : 4329 - 4341
  • [5] Comparative performance of Bayesian and AIC-based measures of phylogenetic model uncertainty
    Alfaro, ME
    Huelsenbeck, JP
    [J]. SYSTEMATIC BIOLOGY, 2006, 55 (01) : 89 - 96
  • [6] Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests
    Edwards, David
    de Abreu, Gabriel C. G.
    Labouriau, Rodrigo
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [7] Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests
    David Edwards
    Gabriel CG de Abreu
    Rodrigo Labouriau
    [J]. BMC Bioinformatics, 11 (1)
  • [8] Corrected version of AIC for selecting multivariate normal linear regression models in a general nonnormal case
    Yanagihara, H
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2006, 97 (05) : 1070 - 1089
  • [9] INVESTIGATING MODELS OF HUMAN-PERFORMANCE
    RABBITT, PMA
    MAYLOR, EA
    [J]. BRITISH JOURNAL OF PSYCHOLOGY, 1991, 82 : 259 - 290
  • [10] AIC for the Lasso in generalized linear models
    Ninomiya, Yoshiyuki
    Kawano, Shuichi
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2016, 10 (02): : 2537 - 2560