Species Tree Estimation from Gene Trees by Minimizing Deep Coalescence and Maximizing Quartet Consistency: A Comparative Study and the Presence of Pseudo Species Tree Terraces

被引:2
|
作者
Farah, Ishrat Tanzila [1 ]
Islam, Muktadirul [2 ]
Zinat, Kazi Tasnim [1 ,3 ]
Rahman, Atif Hasan [1 ]
Bayzid, Shamsuzzoha [1 ]
机构
[1] Bangladesh Univ Engn & Technol, Dept Comp Sci & Engn, Dhaka 1205, Bangladesh
[2] Jahangirnagar Univ, Dept Stat, Appl Stat & Data Sci ASDS, Dhaka 1342, Bangladesh
[3] Univ Maryland, Dept Comp Sci, 8125 Paint Branch Dr, College Pk, MD 20742 USA
关键词
Gene tree; incomplete lineage sorting; phylogenomic analysis; species tree; summary method; PHYLOGENETIC ANALYSIS; MAXIMUM-LIKELIHOOD; RECONSTRUCTION; INFERENCE; PROBABILITY; CONCORDANCE; COMPLEXITY; ALGORITHM; MODEL;
D O I
10.1093/sysbio/syab026
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Species tree estimation from multilocus data sets is extremely challenging, especially in the presence of gene tree heterogeneity across the genome due to incomplete lineage sorting (ILS). Summary methods have been developed which estimate gene trees and then combine the gene trees to estimate a species tree by optimizing various optimization scores. In this study, we have extended and adapted the concept of phylogenetic terraces to species tree estimation by "summarizing" a set of gene trees, where multiple species trees with distinct topologies may have exactly the same optimality score (i.e., quartet score, extra lineage score, etc.). We particularly investigated the presence and impacts of equally optimal trees in species tree estimation from multilocus data using summary methods by taking ILS into account. We analyzed two of the most popular ILS-aware optimization criteria: maximize quartet consistency (MQC) and minimize deep coalescence (MDC). Methods based on MQC are provably statistically consistent, whereas MDC is not a consistent criterion for species tree estimation. We present a comprehensive comparative study of these two optimality criteria. Our experiments, on a collection of data sets simulated under ILS, indicate that MDC may result in competitive or identical quartet consistency score as MQC, but could be significantly worse than MQC in terms of tree accuracy-demonstrating the presence and impacts of equally optimal species trees. This is the first known study that provides the conditions for the data sets to have equally optimal trees in the context of phylogenomic inference using summary methods.
引用
收藏
页码:1213 / 1231
页数:19
相关论文
共 39 条
  • [1] From Gene Trees to Species Trees II: Species Tree Inference by Minimizing Deep Coalescence Events
    Zhang, Louxin
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (06) : 1685 - 1691
  • [2] PRANC: ML species tree estimation from the ranked gene trees under coalescence
    Kim, Anastasiia
    Degnan, James H.
    [J]. BIOINFORMATICS, 2020, 36 (18) : 4819 - 4821
  • [3] Consistency Properties of Species Tree Inference by Minimizing Deep Coalescences
    Than, Cuong V.
    Rosenberg, Noah A.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2011, 18 (01) : 1 - 15
  • [4] STEM: species tree estimation using maximum likelihood for gene trees under coalescence
    Kubatko, Laura S.
    Carstens, Bryan C.
    Knowles, L. Lacey
    [J]. BIOINFORMATICS, 2009, 25 (07) : 971 - 973
  • [5] SPECIES TREE ESTIMATION UNDER JOINT MODELING OF COALESCENCE AND DUPLICATION: SAMPLE COMPLEXITY OF QUARTET METHODS
    Hill, Max
    Legried, Brandon
    Roch, Sebastien
    [J]. ANNALS OF APPLIED PROBABILITY, 2022, 32 (06): : 4681 - 4705
  • [6] From gene to organismal phylogeny: Reconciled trees and the gene tree species tree problem
    Page, RDM
    Charleston, MA
    [J]. MOLECULAR PHYLOGENETICS AND EVOLUTION, 1997, 7 (02) : 231 - 240
  • [7] Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees
    Zhang, Chao
    Mirarab, Siavash
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2022, 39 (12)
  • [8] Estimating Optimal Species Trees from Incomplete Gene Trees Under Deep Coalescence
    Bayzid, Md Shamsuzzoha
    Warnow, Tandy
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (06) : 591 - 605
  • [9] STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency
    Mazharul Islam
    Kowshika Sarker
    Trisha Das
    Rezwana Reaz
    Md. Shamsuzzoha Bayzid
    [J]. BMC Genomics, 21
  • [10] STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency
    Islam, Mazharul
    Sarker, Kowshika
    Das, Trisha
    Reaz, Rezwana
    Bayzid, Md. Shamsuzzoha
    [J]. BMC GENOMICS, 2020, 21 (01)