Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees

被引:31
|
作者
Zhang, Chao [1 ]
Mirarab, Siavash [2 ]
机构
[1] Univ Calif San Diego, Bioinformat & Syst Biol, La Jolla, CA USA
[2] Univ Calif San Diego, Dept Elect & Comp Engn, La Jolla, CA 92093 USA
基金
美国国家科学基金会;
关键词
phylogenomics; ILS; summary methods; ASTRAL; gene tree estimation error; ULTRACONSERVED ELEMENTS; DATA SETS; BOOTSTRAP; INFERENCE; RECONSTRUCTION; PHYLOGENOMICS; CONCATENATION; SIMULATION; CHALLENGES; SUPPORT;
D O I
10.1093/molbev/msac215
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Phylogenomic analyses routinely estimate species trees using methods that account for gene tree discordance. However, the most scalable species tree inference methods, which summarize independently inferred gene trees to obtain a species tree, are sensitive to hard-to-avoid errors introduced in the gene tree estimation step. This dilemma has created much debate on the merits of concatenation versus summary methods and practical obstacles to using summary methods more widely and to the exclusion of concatenation. The most successful attempt at making summary methods resilient to noisy gene trees has been contracting low support branches from the gene trees. Unfortunately, this approach requires arbitrary thresholds and poses new challenges. Here, we introduce threshold-free weighting schemes for the quartet-based species tree inference, the metric used in the popular method ASTRAL. By reducing the impact of quartets with low support or long terminal branches (or both), weighting provides stronger theoretical guarantees and better empirical performance than the unweighted ASTRAL. Our simulations show that weighting improves accuracy across many conditions and reduces the gap with concatenation in conditions with low gene tree discordance and high noise. On empirical data, weighting improves congruence with concatenation and increases support. Together, our results show that weighting, enabled by a new optimization algorithm we introduce, improves the utility of summary methods and can reduce the incongruence often observed across analytical pipelines.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] QuCo: quartet-based co-estimation of species trees and gene trees
    Rabiee, Maryam
    Mirarab, Siavash
    [J]. BIOINFORMATICS, 2022, 38 (SUPPL 1) : 413 - 421
  • [2] Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction
    Erfan Sayyari
    Siavash Mirarab
    [J]. BMC Genomics, 17
  • [3] Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction
    Sayyari, Erfan
    Mirarab, Siavash
    [J]. BMC GENOMICS, 2016, 17
  • [4] Inconsistency of Triplet-Based and Quartet-Based Species Tree Estimation under Intralocus Recombination
    Hill, Max
    Roch, Sebastien
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2022, 29 (11) : 1173 - 1197
  • [5] ASTRAL-Pro: Quartet-Based Species-Tree Inference despite Paralogy
    Zhang, Chao
    Scornavacca, Celine
    Molloy, Erin K.
    Mirarab, Siavash
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2020, 37 (11) : 3292 - 3307
  • [6] The large-sample asymptotic behaviour of quartet-based summary methods for species tree inference
    Yao-ban Chan
    Qiuyi Li
    Celine Scornavacca
    [J]. Journal of Mathematical Biology, 2022, 85
  • [7] Quartet-based phylogeny reconstruction from gene orders
    Liu, T
    Tang, JJ
    Moret, BME
    [J]. COMPUTING AND COMBINATORICS, PROCEEDINGS, 2005, 3595 : 63 - 73
  • [8] The large-sample asymptotic behaviour of quartet-based summary methods for species tree inference
    Chan, Yao-ban
    Li, Qiuyi
    Scornavacca, Celine
    [J]. JOURNAL OF MATHEMATICAL BIOLOGY, 2022, 85 (03)
  • [9] A New Quartet-Based Statistical Method for Comparing Sets of Gene Trees Is Developed Using a Generalized Hoeffding Inequality
    Avni, Eliran
    Snir, Sagi
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2019, 26 (01) : 27 - 37
  • [10] AUGIST: inferring species trees while accommodating gene tree uncertainty
    Oliver, Jeffrey C.
    [J]. BIOINFORMATICS, 2008, 24 (24) : 2932 - 2933