On distances between phylogenetic trees

被引:0
|
作者
He, X
Jiang, T
Li, M
Tromp, J
机构
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Different phylogenetic trees for the same group of species are often produced either by procedures that use diverse optimality criteria [18] or from different genes [12] in the study of molecular evolution. Comparing these trees to find their similarities(e.g. agreement or consensus) and dissimilarities, i.e. distance, is thus an important issue in computational molecular biology. The nearest neighbor interchange (nni) distance [25, 24, 32, 4, 5, 3, 16, 17, 19, 29, 20, 21, 23] and the subtree-transfer distance [12, 13, 15] are two major distance metrics that have been proposed and extensively studied for different reasons. Despite their many appealing aspects such as simplicity and sensitivity to tree topologies, computing these distances has remained very challenging. This article studies the complexity and efficient approximation algorithms for computing the nni distance and a natural extension of the subtree-transfer distance, called the linear-cost subtree-transfer distance. The linear-cost subtree-transfer model is more logical than the (unit-cost) subtree-transfer model and in fact coincides with the nni model under certain conditions. The following results have been obtained as part of our project of building a comprehensive software package for computing distances between phylogenies. 1. Computing the nni distance is NP-complete. This solves a 25 year old open question appearing again and again in, for example, [25, 32, 4, 5, 3, 16, 17, 19, 20, 21, 23] under the complexity-theoretic assumption of P not equal NP. We also answer an open question [4] regarding the nni distance between unlabeled trees for which an erroneous proof appeared in [19]. We give an algorithm to compute the optimal nni sequence in time O(n(2) log n+n . 2(O(d))), where the nni distance is at most d. The algorithm allows us to implement practical programs when d is small. All above results also hold for linear-cost subtree-transfer. 2. Biological applications require us to extend the nni and linear-cost subtree-transfer models to weighted phylogenies, where edge weights indicate the length of evolution along each edge. We present a logarithmic ratio approximation algorithm for nni and a ratio 2 approximation algorithm for linear-cost subtree-transfer, on weighted trees.
引用
收藏
页码:427 / 436
页数:10
相关论文
共 50 条
  • [41] On the Complexity of Computing MP Distance Between Binary Phylogenetic Trees
    Steven Kelk
    Mareike Fischer
    Annals of Combinatorics, 2017, 21 : 573 - 604
  • [42] On Local Representation of Distances in Trees
    Gavoille, Cyril
    Labourel, Arnaud
    PODC'07: PROCEEDINGS OF THE 26TH ANNUAL ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING, 2007, : 352 - 353
  • [43] On the distribution of distances in recursive trees
    Dobrow, RP
    JOURNAL OF APPLIED PROBABILITY, 1996, 33 (03) : 749 - 757
  • [44] On the Complexity of Computing MP Distance Between Binary Phylogenetic Trees
    Kelk, Steven
    Fischer, Mareike
    ANNALS OF COMBINATORICS, 2017, 21 (04) : 573 - 604
  • [45] Aggregated Dendrograms for Visual Comparison between Many Phylogenetic Trees
    Liu, Zipeng
    Zhan, Shing Hei
    Munzner, Tamara
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2020, 26 (09) : 2732 - 2747
  • [46] A fully resolved consensus between fully resolved phylogenetic trees
    Amgarten Quitzau, Jose Augusto
    Meidanis, Joao
    GENETICS AND MOLECULAR RESEARCH, 2006, 5 (01) : 269 - 283
  • [47] Sum of weighted distances in trees
    Cai, Qingqiong
    Li, Tao
    Shi, Yongtang
    Wang, Hua
    DISCRETE APPLIED MATHEMATICS, 2019, 257 : 67 - 84
  • [48] CTree:: comparison of clusters between phylogenetic trees made easy
    Archer, John
    Robertson, David L.
    BIOINFORMATICS, 2007, 23 (21) : 2952 - 2953
  • [49] Matchings and phylogenetic trees
    Diaconis, PW
    Holmes, SP
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) : 14600 - 14602
  • [50] Statistics for phylogenetic trees
    Holmes, S
    THEORETICAL POPULATION BIOLOGY, 2003, 63 (01) : 17 - 32