Using Constrained-INC for Large-Scale Gene Tree and Species Tree Estimation

被引:3
|
作者
Le, Thien [1 ]
Sy, Aaron [2 ]
Molloy, Erin K. [5 ]
Zhang, Qiuyi [3 ]
Rao, Satish [4 ]
Warnow, Tandy [5 ]
机构
[1] MIT, Dept EECS, Cambridge, MA 02139 USA
[2] YouTube, San Bruno, CA 94066 USA
[3] Google Brain, Mountain View, CA 94043 USA
[4] Univ Calif Berkeley, Dept EECS, Berkeley, CA 94720 USA
[5] Univ Illinois, Dept Comp Sci, Champaign, IL 61801 USA
基金
美国国家科学基金会;
关键词
Phylogeny; gene tree; species tree; maximum likelihood; divide-and-conquer;
D O I
10.1109/TCBB.2020.2990867
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Incremental tree building (INC) is a new phylogeny estimation method that has been proven to be absolute fast converging under standard sequence evolution models. A variant of INC, called Constrained-INC, is designed for use in divide-and-conquer pipelines for phylogeny estimation where a set of species is divided into disjoint subsets, trees are computed on the subsets using a selected base method, and then the subset trees are combined together. We evaluate the accuracy of INC and Constrained-INC for gene tree and species tree estimation on simulated datasets, and compare it to similar pipelines using NJMerge (another method that merges disjoint trees). For gene tree estimation, we find that INC has very poor accuracy in comparison to standard methods, and even Constrained-INC(using maximum likelihood methods to compute constraint trees) does not match the accuracy of the better maximum likelihood methods. Results for species trees are somewhat different, with Constrained-INC coming close to the accuracy of the best species tree estimation methods, while being much faster; furthermore, using Constrained-INC allows species tree estimation methods to scale to large datasets within limited computational resources. Overall, this study exposes the benefits and limitations of divide-andconquer strategies for large-scale phylogenetic tree estimation.
引用
收藏
页码:2 / 15
页数:14
相关论文
共 50 条
  • [1] Disjoint Tree Mergers for Large-Scale Maximum Likelihood Tree Estimation
    Park, Minhyuk
    Zaharias, Paul
    Warnow, Tandy
    [J]. ALGORITHMS, 2021, 14 (05)
  • [2] DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony
    Wehe, Andre
    Bansal, Mukul S.
    Burleigh, J. Gordon
    Eulenstein, Oliver
    [J]. BIOINFORMATICS, 2008, 24 (13) : 1540 - 1541
  • [3] PET: Probabilistic Estimating Tree for Large-Scale RFID Estimation
    Zheng, Yuanqing
    Li, Mo
    [J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2012, 11 (11) : 1763 - 1774
  • [4] PET: Probabilistic Estimating Tree for Large-Scale RFID Estimation
    Zheng, Yuanqing
    Li, Mo
    Qian, Chen
    [J]. 31ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2011), 2011, : 37 - 46
  • [5] Large-scale patterns of tree species richness and the metabolic theory of ecology
    Fang, Jingyun
    Wang, Zhiheng
    Tang, Zhiyao
    Brown, James H.
    [J]. GLOBAL ECOLOGY AND BIOGEOGRAPHY, 2012, 21 (04): : 508 - 512
  • [6] Hierarchical Learning of Tree Classifiers for Large-Scale Plant Species Identification
    Fan, Jianping
    Zhou, Ning
    Peng, Jinye
    Gao, Ling
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (11) : 4172 - 4184
  • [7] iGTP: A software package for large-scale gene tree parsimony analysis
    Chaudhary, Ruchi
    Bansal, Mukul S.
    Wehe, Andre
    Fernandez-Baca, David
    Eulenstein, Oliver
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [8] EVOLUTION Large-Scale Gene Comparisons Boost Tree of Life Studies
    Pennisi, Elizabeth
    [J]. SCIENCE, 2013, 342 (6154) : 26 - 27
  • [9] iGTP: A software package for large-scale gene tree parsimony analysis
    Ruchi Chaudhary
    Mukul S Bansal
    André Wehe
    David Fernández-Baca
    Oliver Eulenstein
    [J]. BMC Bioinformatics, 11 (1)
  • [10] An Experimental Analysis of Consensus Tree Algorithms for Large-Scale Tree Collections
    Sul, Seung-Jin
    Williams, Tiffani L.
    [J]. BIOINFORMATICS RESEARCH AND APPLICATIONS: 5TH INTERNATIONAL SYMPOSIUM, ISBRA 2009, 2009, 5542 : 100 - 111