Challenges in Species Tree Estimation Under the Multispecies Coalescent Model

被引:107
|
作者
Xu, Bo [1 ]
Yang, Ziheng [1 ,2 ]
机构
[1] Chinese Acad Sci, Beijing Inst Genom, Beijing 100101, Peoples R China
[2] UCL, Dept Genet Evolut & Environm, Gower St, London WC1E 6BT, England
关键词
anomaly zone; BPP; concatenation; gene trees; incomplete lineage sorting; maximum likelihood; multispecies coalescent; species trees; ANCESTRAL POPULATION SIZES; GENE TREES; DNA-SEQUENCES; PHYLOGENETIC ANALYSIS; MAXIMUM-LIKELIHOOD; BAYESIAN-INFERENCE; DIVERGENCE TIME; GENOME; CONSISTENCY; SPECIATION;
D O I
10.1534/genetics.116.190173
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The multispecies coalescent (MSC) model has emerged as a powerful framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. A number of methods have been developed in the past few years to estimate the species tree under the MSC. The full likelihood methods (including maximum likelihood and Bayesian inference) average over the unknown gene trees and accommodate their uncertainties properly but involve intensive computation. The approximate or summary coalescent methods are computationally fast and are applicable to genomic datasets with thousands of loci, but do not make an efficient use of information in the multilocus data. Most of them take the two-step approach of reconstructing the gene trees for multiple loci by phylogenetic methods and then treating the estimated gene trees as observed data, without accounting for their uncertainties appropriately. In this article we review the statistical nature of the species tree estimation problem under the MSC, and explore the conceptual issues and challenges of species tree estimation by focusing mainly on simple cases of three or four closely related species. We use mathematical analysis and computer simulation to demonstrate that large differences in statistical performance may exist between the two classes of methods. We illustrate that several counterintuitive behaviors may occur with the summary methods but they are due to inefficient use of information in the data by summary methods and vanish when the data are analyzed using full-likelihood methods. These include (i) unidentifiability of parameters in the model, (ii) inconsistency in the so-called anomaly zone, (iii) singularity on the likelihood surface, and (iv) deterioration of performance upon addition of more data. We discuss the challenges and strategies of species tree inference for distantly related species when the molecular clock is violated, and highlight the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods.
引用
收藏
页码:1353 / 1368
页数:16
相关论文
共 50 条
  • [1] Split Probabilities and Species Tree Inference Under the Multispecies Coalescent Model
    Allman, Elizabeth S.
    Degnan, James H.
    Rhodes, John A.
    BULLETIN OF MATHEMATICAL BIOLOGY, 2018, 80 (01) : 64 - 103
  • [2] Split Probabilities and Species Tree Inference Under the Multispecies Coalescent Model
    Elizabeth S. Allman
    James H. Degnan
    John A. Rhodes
    Bulletin of Mathematical Biology, 2018, 80 : 64 - 103
  • [3] Efficient Bayesian Species Tree Inference under the Multispecies Coalescent
    Rannala, Bruce
    Yang, Ziheng
    SYSTEMATIC BIOLOGY, 2017, 66 (05) : 823 - 842
  • [4] Algorithmic improvements to species delimitation and phylogeny estimation under the multispecies coalescent
    Jones, Graham
    JOURNAL OF MATHEMATICAL BIOLOGY, 2017, 74 (1-2) : 447 - 467
  • [5] Algorithmic improvements to species delimitation and phylogeny estimation under the multispecies coalescent
    Graham Jones
    Journal of Mathematical Biology, 2017, 74 : 447 - 467
  • [6] Estimation of speciation times under the multispecies coalescent
    Peng, Jing
    Swofford, David L.
    Kubatko, Laura
    BIOINFORMATICS, 2022, 38 (23) : 5182 - 5190
  • [7] Hierarchical Heuristic Species Delimitation Under the Multispecies Coalescent Model with Migration
    Kornai, Daniel
    Jiao, Xiyun
    Ji, Jiayi
    Flouri, Tomas
    Yang, Ziheng
    SYSTEMATIC BIOLOGY, 2024,
  • [8] Impact of Model Violations on the Inference of Species Boundaries Under the Multispecies Coalescent
    Barley, Anthony J.
    Brown, Jeremy M.
    Thomson, Robert C.
    SYSTEMATIC BIOLOGY, 2018, 67 (02) : 269 - 284
  • [9] Effects of missing data on species tree estimation under the coalescent
    Hovmoeller, Rasmus
    Knowles, L. Lacey
    Kubatko, Laura S.
    MOLECULAR PHYLOGENETICS AND EVOLUTION, 2013, 69 (03) : 1057 - 1062
  • [10] Computing the probability of gene trees concordant with the species tree in the multispecies coalescent
    Truszkowski, Jakub
    Scornavacca, Celine
    Pardi, Fabio
    THEORETICAL POPULATION BIOLOGY, 2021, 137 : 22 - 31