How Well Does Your Phylogenetic Model Fit Your Data?

被引:14
|
作者
Shepherd, Daisy A. [1 ]
Klaere, Steffen [1 ,2 ]
机构
[1] Univ Auckland, Dept Stat, Private Bag 92019, Auckland 1142, New Zealand
[2] Univ Auckland, Sch Biol Sci, Auckland, New Zealand
关键词
GOODNESS-OF-FIT; MAXIMUM-LIKELIHOOD; STATISTICAL TESTS; SEQUENCE DATA; TREE; SITES; EVOLUTION; SELECTION; RECONSTRUCTION; CHARACTERS;
D O I
10.1093/sysbio/syy066
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The test for model-to-data fitness is a fundamental principle within the statistical sciences. The purpose of such a test is to assess whether the selected best-fitting model adequately describes the behavior in the data. Despite their broad application across many areas of statistics, goodness of fit tests for phylogenetic models have received much less attention than model selection methods in the last decade. At present a number of approaches have been suggested. However, these are often flawed, with problems ranging from the presence of systematic error in the models themselves to the difficulties presented by the nature of phylogenetic data. Ultimately these problems lead to an inadequate choice of statistic. This is one of the main reasons why goodness of fit assessment is often a neglected step within phylogenetic analysis. We argue not only for the necessity of these goodness of fit measures to test how well the model reflects the data, but additionally for the need for useful tests that explain why the model-to-data fit may be inadequate. Such tests are a critical part of the model building process, allowing the model to be adapted to provide a better model-to-data fit or to reject a model class outright due to such an inadequate fit that the intended use of the class may be compromised. Proposed and existing methods in both the maximum likelihood and Bayesian framework will be discussed here, whilst highlighting their strengths and limitations for assessing goodness of fit. The final section discusses some critical open statistical problems in goodness of fit assessment for this field, with the hope of encouraging more research into such a fundamental yet underdeveloped area of phylogenetic inference. [Bayesian phylogenetics; Goodness of fit; maximum likelihood; molecular phylogenetics; outlier detection; residual diagnostics.].
引用
收藏
页码:157 / 167
页数:11
相关论文
共 50 条
  • [31] How well are your requirements tested?
    Arts, Thomas
    Hughes, John
    2016 9TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST), 2016, : 244 - 254
  • [32] How Good Is Your Scientific Data Generative Model?
    Yang, Yuxin
    Gremillion, Ben
    Zhang, Xitong
    Lin, Youzuo
    Wohlberg, Brendt
    Guan, Qiang
    2020 IEEE/ACM WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2020) AND WORKSHOP ON ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR SCIENTIFIC APPLICATIONS (AI4S 2020), 2020, : 96 - 102
  • [33] Reviewing your investment strategy: where does diet fit in your personal portfolio
    Denke, MA
    AMERICAN JOURNAL OF CLINICAL NUTRITION, 2005, 81 (02): : 339 - 340
  • [34] HOW WELL DOES THE IS-LM MODEL FIT POSTWAR UNITED-STATES DATA
    GALI, J
    QUARTERLY JOURNAL OF ECONOMICS, 1992, 107 (02): : 709 - 738
  • [35] TAX CROPPING - HOW TO MANAGE YOUR BOOKS AS WELL AS YOUR LIVESTOCK
    DUFAUR, R
    NEW ZEALAND JOURNAL OF AGRICULTURE, 1979, 139 (06): : 51 - 52
  • [36] HOW WELL DOES MOTION CONVEY AN OBJECTS SHAPE - IT DEPENDS ON YOUR VIEWPOINT
    COWIE, R
    OLD, A
    IRISH JOURNAL OF PSYCHOLOGY, 1993, 14 (03): : 361 - 374
  • [37] Are Our Climate Data Fit for Your Purpose?
    Dee, Dick
    Obregon, Andre
    Buontempo, Carlo
    BULLETIN OF THE AMERICAN METEOROLOGICAL SOCIETY, 2024, 105 (09) : E1723 - E1733
  • [38] Data is data and model is model: You don't discard the data that doesn't fit your model!
    Laury, R
    Ono, T
    LANGUAGE, 2005, 81 (01) : 218 - 225
  • [39] Does your luggage fit in the overhead bin on the airplane?
    Holcombe, JA
    APPLIED SPECTROSCOPY, 1996, 50 (12) : A10 - &
  • [40] How robust is your data?
    不详
    NATURE CELL BIOLOGY, 2009, 11 (06) : 667 - 667