Scalable and Extensible Robinson-Foulds for Comparative Phylogenetics

被引:0
|
作者
Chon, Alvin [1 ]
Gorecki, Pawel [2 ]
Eulenstein, Oliver [1 ]
Huang, Xiaoqiu [1 ]
Jannesari, Ali [1 ]
机构
[1] Iowa State Univ, Bioinformat & Computat Biol, Ames, IA 50011 USA
[2] Univ Warsaw, Inst Informat, Warsaw, Poland
关键词
Comparative phylogenetics; Robinson-Foulds distance; algorithm; parallelization; DISTANCE; TREES; GENE;
D O I
10.1109/IPDPSW55747.2022.00041
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Robinson-Foulds(RF) is a widely used metric in various phylogenetic analyses including clustering and generating consensus or most-parsimonious trees. Current methods are limited by one or more of the following: 1 versus 1 computation, limited to the basic RF calculation, use one tree collection, are not scalable, and restrict taxa. This paper presents Bipartition Frequency Hash Robinson-Foulds (BFHRF), a scalable and extensible approach for computing the average RF between disparate binary evolutionary tree collections. The novelty of our approach is utilizing a bipartition frequency hash data structure to perform parallelized tree versus hash comparisons in substitution of all possible tree versus tree comparisons. The data structure and updated computation algorithm results in an order of magnitude reduction in both runtime and memory usage. It is 39x faster and 22x reduction in memory compared to HashRF, a fast current method. Additionally, the tree collection distribution can be modified for RF variants and variable taxa due to the lack of restrictions imposed by the hash and retention of all bipartitions. Lastly, BFHRF is implemented in a modular way and provides an easy to use installation and interface for calculating the average RF of query trees against a collection of reference trees. https://github.com/achon/bfhrf
引用
收藏
页码:166 / 175
页数:10
相关论文
共 50 条
  • [1] Robinson-Foulds Supertrees
    Bansal, Mukul S.
    Burleigh, J. Gordon
    Eulenstein, Oliver
    Fernandez-Baca, David
    [J]. ALGORITHMS FOR MOLECULAR BIOLOGY, 2010, 5
  • [2] Robinson-Foulds Supertrees
    Mukul S Bansal
    J Gordon Burleigh
    Oliver Eulenstein
    David Fernández-Baca
    [J]. Algorithms for Molecular Biology, 5
  • [3] Robinson-Foulds Reticulation Networks
    Markin, Alexey
    Anderson, Tavis K.
    Vadali, Venkata Sai Krishna Teja
    Eulenstein, Oliver
    [J]. ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 77 - 86
  • [4] Computing the distribution of the Robinson-Foulds distance
    Hayati, Maryam
    Chindelevitch, Leonid
    [J]. Chindelevitch, Leonid (leonid_chindelevitch@sfu.ca), 1600, Elsevier Ltd (87):
  • [5] Properties of the generalized Robinson-Foulds metric
    Borozan, L.
    Matijevic, D.
    Canzar, S.
    [J]. 2019 42ND INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2019, : 330 - 335
  • [6] Efficiently computing the Robinson-Foulds metric
    Pattengale, Nicholas D.
    Gottlieb, Eric J.
    Moret, Bernard M. E.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2007, 14 (06) : 724 - 735
  • [7] Computing the distribution of the Robinson-Foulds distance
    Hayati, Maryam
    Chindelevitch, Leonid
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2020, 87
  • [8] A generalized Robinson-Foulds distance for labeled trees
    Briand, Samuel
    Dessimoz, Christophe
    El-Mabrouk, Nadia
    Lafond, Manuel
    Lobinska, Gabriela
    [J]. BMC GENOMICS, 2020, 21 (Suppl 10)
  • [9] The Generalized Robinson-Foulds Distance for Phylogenetic Trees
    Llabres, Merce
    Rossello, Francesc
    Valiente, Gabriel
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2021, 28 (12) : 1181 - 1195
  • [10] A generalized Robinson-Foulds distance for labeled trees
    Samuel Briand
    Christophe Dessimoz
    Nadia El-Mabrouk
    Manuel Lafond
    Gabriela Lobinska
    [J]. BMC Genomics, 21