A sublinear-time randomized approximation scheme for the Robinson-Foulds metric

被引:0
|
作者
Pattengale, Nicholas D. [1 ]
Moret, Bernard M. E. [1 ]
机构
[1] Univ New Mexico, Dept Comp Sci, Albuquerque, NM 87131 USA
关键词
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The Robinson-Foulds (RF) metric is the measure most widely used in comparing phylogenetic trees; it can be computed in linear time using Day's algorithm. When faced with the need to compare large numbers of large trees, however, even linear time becomes prohibitive. We present a randomized approximation scheme that provides, with high probability, a (1 + epsilon) approximation of the true RF metric for all pairs of trees in a given collection. Our approach is to use a sublinear-space embedding of the trees, combined with an application of the Johnson-Lindenstrauss lemma to approximate vector norms very rapidly. We discuss the consequences of various parameter choices (in the embedding and in the approximation requirements). We also implemented our algorithm as a Java class that can easily be combined with popular packages such as Mesquite; in consequence, we present experimental results illustrating the precision and running-time tradeoffs as well as demonstrating the speed of our approach.
引用
收藏
页码:221 / 230
页数:10
相关论文
共 28 条
  • [1] A sublinear-time randomized approximation scheme for the robinson-foulds metric
    Pattengale, Nicholas D.
    Moret, Bernard M. E.
    [J]. Lect. Notes Comput. Sci., 1600, (221-230):
  • [2] Properties of the generalized Robinson-Foulds metric
    Borozan, L.
    Matijevic, D.
    Canzar, S.
    [J]. 2019 42ND INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2019, : 330 - 335
  • [3] Efficiently computing the Robinson-Foulds metric
    Pattengale, Nicholas D.
    Gottlieb, Eric J.
    Moret, Bernard M. E.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2007, 14 (06) : 724 - 735
  • [4] A sublinear-time approximation scheme for bin packing
    Batu, Tugkan
    Berenbrink, Petra
    Sohler, Christian
    [J]. THEORETICAL COMPUTER SCIENCE, 2009, 410 (47-49) : 5082 - 5092
  • [5] Metrics for Phylogenetic Networks I: Generalizations of the Robinson-Foulds Metric
    Cardona, Gabriel
    Llabres, Merce
    Rossello, Francesc
    Valiente, Gabriel
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2009, 6 (01) : 46 - 61
  • [6] The Connection of the Generalized Robinson-Foulds Metric with Partial Wiener Indices
    Vukicevic, Damir
    Matijevic, Domagoj
    [J]. ACTA BIOTHEORETICA, 2023, 71 (01)
  • [7] A SUBLINEAR-TIME RANDOMIZED APPROXIMATION ALGORITHM FOR MATRIX GAMES
    GRIGORIADIS, MD
    KHACHIYAN, LG
    [J]. OPERATIONS RESEARCH LETTERS, 1995, 18 (02) : 53 - 58
  • [8] A Linear Time Solution to the Labeled Robinson-Foulds Distance Problem
    Briand, Samuel
    Dessimoz, Christophe
    El-Mabrouk, Nadia
    Nevers, Yannis
    [J]. SYSTEMATIC BIOLOGY, 2022, 71 (06) : 1391 - 1403
  • [9] Sublinear-time approximation for clustering via random sampling
    Czumaj, A
    Sohler, C
    [J]. AUTOMATA , LANGUAGES AND PROGRAMMING, PROCEEDINGS, 2004, 3142 : 396 - 407
  • [10] Improved approximation guarantees for sublinear-time Fourier algorithms
    Iwen, Mark A.
    [J]. APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 2013, 34 (01) : 57 - 82