Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance

被引:18
|
作者
Chaudhary, Ruchi [1 ,2 ]
Burleigh, John Gordon [2 ]
Fernandez-Baca, David [1 ]
机构
[1] Iowa State Univ, Dept Comp Sci, Ames, IA 50011 USA
[2] Univ Florida, Dept Biol, Gainesville, FL 32611 USA
基金
美国国家科学基金会;
关键词
ANGIOSPERM PHYLOGENY; MAXIMUM-LIKELIHOOD; ALGORITHMS;
D O I
10.1186/1748-7188-8-28
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Constructing species trees from multi-copy gene trees remains a challenging problem in phylogenetics. One difficulty is that the underlying genes can be incongruent due to evolutionary processes such as gene duplication and loss, deep coalescence, or lateral gene transfer. Gene tree estimation errors may further exacerbate the difficulties of species tree estimation. Results: We present a new approach for inferring species trees from incongruent multi-copy gene trees that is based on a generalization of the Robinson-Foulds (RF) distance measure to multi-labeled trees (mul-trees). We prove that it is NP-hard to compute the RF distance between two mul-trees; however, it is easy to calculate this distance between a mul-tree and a singly-labeled species tree. Motivated by this, we formulate the RF problem for mul-trees (MulRF) as follows: Given a collection of multi-copy gene trees, find a singly-labeled species tree that minimizes the total RF distance from the input mul-trees. We develop and implement a fast SPR-based heuristic algorithm for the NP-hard MulRF problem. We compare the performance of the MulRF method (available at http://genome.cs.iastate.edu/CBL/MulRF/) with several gene tree parsimony approaches using gene tree simulations that incorporate gene tree error, gene duplications and losses, and/or lateral transfer. The MulRF method produces more accurate species trees than gene tree parsimony approaches. We also demonstrate that the MulRF method infers in minutes a credible plant species tree from a collection of nearly 2,000 gene trees. Conclusions: Our new phylogenetic inference method, based on a generalized RF distance, makes it possible to quickly estimate species trees from large genomic data sets. Since the MulRF method, unlike gene tree parsimony, is based on a generic tree distance measure, it is appealing for analyses of genomic data sets, in which many processes such as deep coalescence, recombination, gene duplication and losses as well as phylogenetic error may contribute to gene tree discord. In experiments, the MulRF method estimated species trees accurately and quickly, demonstrating MulRF as an efficient alternative approach for phylogenetic inference from large-scale genomic data sets.
引用
收藏
页数:12
相关论文
共 29 条
  • [1] Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance
    Ruchi Chaudhary
    John Gordon Burleigh
    David Fernández-Baca
    [J]. Algorithms for Molecular Biology, 8
  • [2] A generalized Robinson-Foulds distance for labeled trees
    Briand, Samuel
    Dessimoz, Christophe
    El-Mabrouk, Nadia
    Lafond, Manuel
    Lobinska, Gabriela
    [J]. BMC GENOMICS, 2020, 21 (Suppl 10)
  • [3] The Generalized Robinson-Foulds Distance for Phylogenetic Trees
    Llabres, Merce
    Rossello, Francesc
    Valiente, Gabriel
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2021, 28 (12) : 1181 - 1195
  • [4] A generalized Robinson-Foulds distance for labeled trees
    Samuel Briand
    Christophe Dessimoz
    Nadia El-Mabrouk
    Manuel Lafond
    Gabriela Lobinska
    [J]. BMC Genomics, 21
  • [5] A Generalized Robinson-Foulds Distance for Clonal Trees, Mutation Trees, and Phylogenetic Trees and Networks
    Llabres, Merce
    Rossello, Francesc
    Valiente, Gabriel
    [J]. ACM-BCB 2020 - 11TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2020,
  • [6] Assessing Approaches for Inferring Species Trees from Multi-Copy Genes
    Chaudhary, Ruchi
    Boussau, Bastien
    Burleigh, J. Gordon
    Fernandez-Baca, David
    [J]. SYSTEMATIC BIOLOGY, 2015, 64 (02) : 325 - 339
  • [7] Species Trees from Highly Incongruent Gene Trees in Rice
    Cranston, Karen A.
    Hurwitz, Bonnie
    Ware, Doreen
    Stein, Lincoln
    Wing, Rod A.
    [J]. SYSTEMATIC BIOLOGY, 2009, 58 (05) : 489 - 500
  • [8] Improvements to a Class of Distance Matrix Methods for Inferring Species Trees from Gene Trees
    Helmkamp, Laura J.
    Jewett, Ethan M.
    Rosenberg, Noah A.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (06) : 632 - 649
  • [9] MulRF: a software package for phylogenetic analysis using multi-copy gene trees
    Chaudhary, Ruchi
    Fernandez-Baca, David
    Burleigh, John Gordon
    [J]. BIOINFORMATICS, 2015, 31 (03) : 432 - 433
  • [10] ASTRAL-Pro 2: ultrafast species tree reconstruction from multi-copy gene family trees
    Zhang, Chao
    Mirarab, Siavash
    [J]. BIOINFORMATICS, 2022, 38 (21) : 4949 - 4950