A generalized Robinson-Foulds distance for labeled trees

被引:11
|
作者
Briand, Samuel [1 ]
Dessimoz, Christophe [2 ,3 ,4 ,5 ,6 ]
El-Mabrouk, Nadia [1 ]
Lafond, Manuel [7 ]
Lobinska, Gabriela [3 ]
机构
[1] Univ Montreal, Comp Sci Dept, Montreal, PQ, Canada
[2] Univ Lausanne, Dept Computat Biol, Lausanne, Switzerland
[3] UCL, Dept Genet Evolut & Environm, London, England
[4] Univ Lausanne, Ctr Integrat Genom, Lausanne, Switzerland
[5] Swiss Inst Bioinformat, Lausanne, Switzerland
[6] UCL, Dept Comp Sci, London, England
[7] Univ Sherbrooke, Comp Sci Dept, Sherbrooke, PQ, Canada
基金
加拿大自然科学与工程研究理事会; 瑞士国家科学基金会;
关键词
Edit distance; Labeled trees; Robinson-Foulds; Tree metric; PHYLOGENETIC TREES;
D O I
10.1186/s12864-020-07011-0
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc). Results We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting "good" edges, i.e. edges shared between the two trees. Conclusions We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions.Implementation and simulations available at .
引用
收藏
页数:13
相关论文
共 50 条
  • [1] A generalized Robinson-Foulds distance for labeled trees
    Samuel Briand
    Christophe Dessimoz
    Nadia El-Mabrouk
    Manuel Lafond
    Gabriela Lobinska
    [J]. BMC Genomics, 21
  • [2] The Generalized Robinson-Foulds Distance for Phylogenetic Trees
    Llabres, Merce
    Rossello, Francesc
    Valiente, Gabriel
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2021, 28 (12) : 1181 - 1195
  • [3] A Generalized Robinson-Foulds Distance for Clonal Trees, Mutation Trees, and Phylogenetic Trees and Networks
    Llabres, Merce
    Rossello, Francesc
    Valiente, Gabriel
    [J]. ACM-BCB 2020 - 11TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2020,
  • [4] A Linear Time Solution to the Labeled Robinson-Foulds Distance Problem
    Briand, Samuel
    Dessimoz, Christophe
    El-Mabrouk, Nadia
    Nevers, Yannis
    [J]. SYSTEMATIC BIOLOGY, 2022, 71 (06) : 1391 - 1403
  • [5] Computing the distribution of the Robinson-Foulds distance
    Hayati, Maryam
    Chindelevitch, Leonid
    [J]. Computational Biology and Chemistry, 2020, 87
  • [6] Properties of the generalized Robinson-Foulds metric
    Borozan, L.
    Matijevic, D.
    Canzar, S.
    [J]. 2019 42ND INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2019, : 330 - 335
  • [7] Computing the distribution of the Robinson-Foulds distance
    Hayati, Maryam
    Chindelevitch, Leonid
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2020, 87
  • [8] Information theoretic generalized Robinson-Foulds metrics for comparing phylogenetic trees
    Smith, Martin R.
    [J]. BIOINFORMATICS, 2020, 36 (20) : 5007 - 5013
  • [9] Are the Duplication Cost and Robinson-Foulds Distance Equivalent?
    Zheng, Yu
    Zhang, Louxin
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2014, 21 (08) : 578 - 590
  • [10] Robinson-Foulds Supertrees
    Bansal, Mukul S.
    Burleigh, J. Gordon
    Eulenstein, Oliver
    Fernandez-Baca, David
    [J]. ALGORITHMS FOR MOLECULAR BIOLOGY, 2010, 5