Fitting tree metrics: Hierarchical clustering and phylogeny

被引:22
|
作者
Ailon, N [1 ]
Charikar, M [1 ]
机构
[1] Princeton Univ, Princeton, NJ 08544 USA
关键词
D O I
10.1109/SFCS.2005.36
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Given dissimilarity data on pairs of objects in a set, we study the problem of fitting a tree metric to this data so as to minimize additive error (i.e. some measure of the difference between the tree metric and the given data). This problem arises in constructing an M-level hierarchical clustering of objects (or an ultrametric on objects) so as to match the given dissimilarity data a basic problem in statistics. Viewed in this way, the problem is a generalization of the correlation clustering problem (which corresponds to M = 1). We give a very simple randomized combinatorial algorithm for the M-level hierarchical clustering problem that achieves an approximation ratio of M+2. This is a generalization of a previous factor 3 algorithm for correlation clustering on complete graphs. The problem of fitting tree metrics also arises in phylogeny where the objective is to learn the evolution tree by fitting a tree to dissimilarity data on taxa. The quality of the fit is measured by taking the l(p) norm of the difference between the tree metric constructed and the given data. Previous results obtained a factor 3 approximation for finding the closest tree tree metric under the l(infinity) norm. No non-trivial approximation for general l(p) norms was known before. We present a novel LP formulation for this problem and obtain an O((log n log log n)(1/p)) approximation using this. En route, we obtain an O((log n log log n)(1/p)) approximation for the closest ultrametric under the l(p) norm. Our techniques are based on representing and viewing an ultrametric as a hierarchy of clusterings, and may be useful in other contexts.
引用
收藏
页码:73 / 82
页数:10
相关论文
共 50 条
  • [1] FITTING TREE METRICS: HIERARCHICAL CLUSTERING AND PHYLOGENY
    Ailon, Nir
    Charikar, Moses
    SIAM JOURNAL ON COMPUTING, 2011, 40 (05) : 1275 - 1291
  • [2] HyperAid: Denoising in Hyperbolic Spaces for Tree-fitting and Hierarchical Clustering
    Chien, Eli
    Tabaghi, Puoya
    Milenkovic, Olgica
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 201 - 211
  • [3] A new metrics for hierarchical clustering
    Yang, GW
    Shi, SM
    Wang, DX
    CHINESE JOURNAL OF ELECTRONICS, 2003, 12 (04): : 494 - 498
  • [4] Fitting distances by tree metrics with increment error
    Ma, B
    Wang, LS
    Zhang, LX
    JOURNAL OF COMBINATORIAL OPTIMIZATION, 1999, 3 (2-3) : 213 - 225
  • [5] Hierarchical Clustering via Spreading Metrics
    Roy, Aurko
    Pokutta, Sebastian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [6] Fitting Distances by Tree Metrics with Increment Error
    Bin Ma
    Lusheng Wang
    Louxin Zhang
    Journal of Combinatorial Optimization, 1999, 3 : 213 - 225
  • [7] Hierarchical Clustering via Spreading Metrics
    Roy, Aurko
    Pokutta, Sebastian
    JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [8] Hierarchical Clustering and Tree Stability
    Saunders, Amanda
    Ashlock, Daniel
    Houghten, Sheridan
    2018 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB), 2018, : 138 - 145
  • [9] On the approximability of numerical taxonomy (fitting distances by tree metrics)
    Agarwala, R
    Bafna, V
    Farach, M
    Paterson, M
    Thorup, M
    SIAM JOURNAL ON COMPUTING, 1999, 28 (03) : 1073 - 1085
  • [10] On the approximability of numerical taxonomy (fitting distances by tree metrics)
    Agarwala, R
    Bafna, V
    Farach, M
    Narayanan, B
    Paterson, M
    Thorup, M
    PROCEEDINGS OF THE SEVENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 1996, : 365 - 372