A sublinear-time randomized approximation scheme for the Robinson-Foulds metric

被引:0
|
作者
Pattengale, Nicholas D. [1 ]
Moret, Bernard M. E. [1 ]
机构
[1] Univ New Mexico, Dept Comp Sci, Albuquerque, NM 87131 USA
关键词
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The Robinson-Foulds (RF) metric is the measure most widely used in comparing phylogenetic trees; it can be computed in linear time using Day's algorithm. When faced with the need to compare large numbers of large trees, however, even linear time becomes prohibitive. We present a randomized approximation scheme that provides, with high probability, a (1 + epsilon) approximation of the true RF metric for all pairs of trees in a given collection. Our approach is to use a sublinear-space embedding of the trees, combined with an application of the Johnson-Lindenstrauss lemma to approximate vector norms very rapidly. We discuss the consequences of various parameter choices (in the embedding and in the approximation requirements). We also implemented our algorithm as a Java class that can easily be combined with popular packages such as Mesquite; in consequence, we present experimental results illustrating the precision and running-time tradeoffs as well as demonstrating the speed of our approach.
引用
收藏
页码:221 / 230
页数:10
相关论文
共 29 条