Assessing Approaches for Inferring Species Trees from Multi-Copy Genes

被引:11
|
作者
Chaudhary, Ruchi [1 ,2 ]
Boussau, Bastien [3 ]
Burleigh, J. Gordon [2 ]
Fernandez-Baca, David [1 ]
机构
[1] Iowa State Univ, Dept Comp Sci, Ames, IA 50011 USA
[2] Univ Florida, Dept Biol, Gainesville, FL 32611 USA
[3] Univ Lyon 1, CNRS, UMR 5558, Lab Biometrie & Biol Evolut, F-69622 Villeurbanne, France
基金
美国国家科学基金会;
关键词
Deep coalescence; gene duplication; gene loss; gene tree parsimony; MulRF; NJ(st); PHYLDOG; MAXIMUM-LIKELIHOOD; PHYLOGENETIC ANALYSIS; DUPLICATION EVENTS; RECONCILED TREES; EVOLUTION; ALGORITHMS; INFERENCE; RECONSTRUCTION; SEQUENCES; SOFTWARE;
D O I
10.1093/sysbio/syu128
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
With the availability of genomic sequence data, there is increasing interest in using genes with a possible history of duplication and loss for species tree inference. Here we assess the performance of both nonprobabilistic and probabilistic species tree inference approaches using gene duplication and loss and coalescence simulations. We evaluated the performance of gene tree parsimony (GTP) based on duplication (Only-dup), duplication and loss (Dup-loss), and deep coalescence (Deep-c) costs, the NJ(st) distance method, the MulRF supertree method, and PHYLDOG, which jointly estimates gene trees and species tree using a hierarchical probabilistic model. We examined the effects of gene tree and species sampling, gene tree error, and duplication and loss rates on the accuracy of phylogenetic estimates. In the 10-taxon duplication and loss simulation experiments, MulRF is more accurate than the other methods when the duplication and loss rates are low, and Dup-loss is generally the most accurate when the duplication and loss rates are high. PHYLDOG performs well in 10-taxon duplication and loss simulations, but its run time is prohibitively long on larger data sets. In the larger duplication and loss simulation experiments, MulRF outperforms all other methods in experiments with at most 100 taxa; however, in the larger simulation, Dup-loss generally performs best. In all duplication and loss simulation experiments with more than 10 taxa, all methods perform better with more gene trees and fewer missing sequences, and they are all affected by gene tree error. Our results also highlight high levels of error in estimates of duplications and losses from GTP methods and demonstrate the usefulness of methods based on generic tree distances for large analyses.
引用
收藏
页码:325 / 339
页数:15
相关论文
共 50 条
  • [1] Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance
    Ruchi Chaudhary
    John Gordon Burleigh
    David Fernández-Baca
    [J]. Algorithms for Molecular Biology, 8
  • [2] Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance
    Chaudhary, Ruchi
    Burleigh, John Gordon
    Fernandez-Baca, David
    [J]. ALGORITHMS FOR MOLECULAR BIOLOGY, 2013, 8
  • [3] ASTRAL-Pro 2: ultrafast species tree reconstruction from multi-copy gene family trees
    Zhang, Chao
    Mirarab, Siavash
    [J]. BIOINFORMATICS, 2022, 38 (21) : 4949 - 4950
  • [4] MulRF: a software package for phylogenetic analysis using multi-copy gene trees
    Chaudhary, Ruchi
    Fernandez-Baca, David
    Burleigh, John Gordon
    [J]. BIOINFORMATICS, 2015, 31 (03) : 432 - 433
  • [5] Population structuring of multi-copy, antigen-encoding genes in Plasmodium falciparum
    Artzy-Randrup, Yael
    Rorick, Mary M.
    Day, Karen
    Chen, Donald
    Dobson, Andrew P.
    Pascual, Mercedes
    [J]. ELIFE, 2012, 1
  • [6] KINETIC-PROPERTIES OF YEAST LYSINE PERMEASES CODED BY GENES ON MULTI-COPY VECTORS
    SYCHROVA, H
    MATEJCKOVA, A
    KOTYK, A
    [J]. FEMS MICROBIOLOGY LETTERS, 1993, 113 (01) : 57 - 61
  • [7] Characteristics of single- and multi-copy microsatellites from Pinus radiata
    Fisher, PJ
    Richardson, TE
    Gardner, RC
    [J]. THEORETICAL AND APPLIED GENETICS, 1998, 96 (6-7) : 969 - 979
  • [8] Characteristics of single- and multi-copy microsatellites from Pinus radiata
    P. J. Fisher
    T. E. Richardson
    R. C. Gardner
    [J]. Theoretical and Applied Genetics, 1998, 96 : 969 - 979
  • [9] Properties of Consensus Methods for Inferring Species Trees from Gene Trees
    Degnan, James H.
    DeGiorgio, Michael
    Bryant, David
    Rosenberg, Noah A.
    [J]. SYSTEMATIC BIOLOGY, 2009, 58 (01) : 35 - 54
  • [10] PCR-based approaches for identification of multi-copy transgene integration sites in mouse genome
    Zhao Xudong
    Dang Suying
    Liang Bin
    Lei Xia
    Chen Zheng
    Wang Long
    Yan Lanzhen
    Sun Hantang
    Fu Jiliang
    Fei Jian
    Wang Zhugang
    [J]. CHINESE SCIENCE BULLETIN, 2006, 51 (18): : 2231 - 2235