Investigating the performance of AIC in selecting phylogenetic models

被引:8
|
作者
Jhwueng, Dwueng-Chwuan [3 ]
Huzurbazar, Snehalata [4 ,5 ,6 ]
O'Meara, Brian C. [7 ]
Liu, Liang [1 ,2 ]
机构
[1] Univ Georgia, Dept Stat, Athens, GA 30606 USA
[2] Univ Georgia, Inst Bioinformat, Athens, GA 30606 USA
[3] Feng Chia Univ, Dept Stat, Taichung 40724, Taiwan
[4] Stat & Appl Math Sci Inst, Res Triangle Pk, NC 27709 USA
[5] Univ Wyoming, Dept Stat, Laramie, WY 82071 USA
[6] N Carolina State Univ, Dept Stat, Raleigh, NC 27695 USA
[7] Univ Tennessee, Dept Ecol & Evolutionary Biol, Knoxville, TN 37996 USA
基金
美国国家科学基金会;
关键词
AIC; Kullback-Leibler divergence; model selection; phylogenetics; AKAIKE INFORMATION CRITERION; LIKELIHOOD-RATIO TEST; SUBSTITUTION MODELS; DNA-SEQUENCES; EVOLUTION; JMODELTEST; ACCURATE; TESTS; RATES;
D O I
10.1515/sagmb-2013-0048
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The popular likelihood-based model selection criterion, Akaike's Information Criterion (AIC), is a breakthrough mathematical result derived from information theory. AIC is an approximation to Kullback-Leibler (KL) divergence with the derivation relying on the assumption that the likelihood function has finite second derivatives. However, for phylogenetic estimation, given that tree space is discrete with respect to tree topology, the assumption of a continuous likelihood function with finite second derivatives is violated. In this paper, we investigate the relationship between the expected log likelihood of a candidate model, and the expected KL divergence in the context of phylogenetic tree estimation. We find that given the tree topology, AIC is an unbiased estimator of the expected KL divergence. However, when the tree topology is unknown, AIC tends to underestimate the expected KL divergence for phylogenetic models. Simulation results suggest that the degree of underestimation varies across phylogenetic models so that even for large sample sizes, the bias of AIC can result in selecting a wrong model. As the choice of phylogenetic models is essential for statistical phylogenetic inference, it is important to improve the accuracy of model selection criteria in the context of phylogenetics.
引用
收藏
页码:459 / 475
页数:17
相关论文
共 50 条
  • [21] Modified conditional AIC in linear mixed models
    Kawakubo, Yuki
    Kubokawa, Tatsuya
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2014, 129 : 44 - 56
  • [22] Bias correction of AIC in logistic regression models
    Yanagihara, H
    Sekiguchi, R
    Fujikoshi, Y
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2003, 115 (02) : 349 - 360
  • [23] HIGH-PERFORMANCE LIQUID-CHROMATOGRAPHY (HPLC) METHOD FOR HEMOGLOBIN AIC (AIC)
    DAVIS, JE
    MCDONALD, JM
    JARETT, L
    [J]. DIABETES, 1977, 26 : 368 - 368
  • [24] Investigating the performance of personalized models for software defect prediction
    Eken, Beyza
    Tosun, Ayse
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2021, 181
  • [25] IMPLI: Investigating NLI Models' Performance on Figurative Language
    Stowe, Kevin
    Utama, Prasetya Ajie
    Gurevych, Iryna
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5375 - 5388
  • [26] Investigating Effects of Selecting Challenging Goals
    Tahir, Faiza
    Mitrovic, Antonija
    Sotardi, Valerie
    [J]. ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II, 2021, 12749 : 349 - 354
  • [27] BranchClust: a phylogenetic algorithm for selecting gene families
    Maria S Poptsova
    J Peter Gogarten
    [J]. BMC Bioinformatics, 8
  • [28] BranchClust: a phylogenetic algorithm for selecting gene families
    Poptsova, Maria S.
    Gogarten, J. Peter
    [J]. BMC BIOINFORMATICS, 2007, 8 (1)
  • [29] On the behaviour of marginal and conditional AIC in linear mixed models
    Greven, Sonja
    Kneib, Thomas
    [J]. BIOMETRIKA, 2010, 97 (04) : 773 - 789
  • [30] Asymptotic bootstrap corrections of AIC for linear regression models
    Seghouane, Abd-Krim
    [J]. SIGNAL PROCESSING, 2010, 90 (01) : 217 - 224