Hierarchical Clustering via Spreading Metrics

被引:0
|
作者
Roy, Aurko [1 ]
Pokutta, Sebastian [2 ]
机构
[1] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, ISyE, Atlanta, GA 30332 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the cost function for hierarchical clusterings introduced by [16] where hierarchies are treated as first-class objects rather than deriving their cost from projections into flat clusters. It was also shown in [16] that a top-down algorithm returns a hierarchical clustering of cost at most O (alpha(n) log n) times the cost of the optimal hierarchical clustering, where alpha(n) is the approximation ratio of the Sparsest Cut subroutine used. Thus using the best known approximation algorithm for Sparsest Cut due to Arora-Rao-Vazirani, the top-down algorithm returns a hierarchical clustering of cost at most O (log(3/2) n) times the cost of the optimal solution. We improve this by giving an O(log n)-approximation algorithm for this problem. Our main technical ingredients are a combinatorial characterization of ultrametrics induced by this cost function, deriving an Integer Linear Programming (ILP) formulation for this family of ultrametrics, and showing how to iteratively round an LP relaxation of this formulation by using the idea of sphere growing which has been extensively used in the context of graph partitioning. We also prove that our algorithm returns an O(log n)-approximate hierarchical clustering for a generalization of this cost function also studied in [16]. We also give constant factor inapproximability results for this problem.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Hierarchical Clustering via Spreading Metrics
    Roy, Aurko
    Pokutta, Sebastian
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [2] Approximate Hierarchical Clustering via Sparsest Cut and Spreading Metrics
    Charikar, Moses
    Chatziafratis, Vaggos
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2017, : 841 - 854
  • [3] A new metrics for hierarchical clustering
    Yang, GW
    Shi, SM
    Wang, DX
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2003, 12 (04) : 494 - 498
  • [4] Hierarchical Clustering via Sketches and Hierarchical Correlation Clustering
    Vainstein, Danny
    Chatziafratis, Vaggos
    Citovsky, Gui
    Rajagopalan, Anand
    Mahdian, Mohammad
    Azar, Yossi
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 559 - +
  • [5] FITTING TREE METRICS: HIERARCHICAL CLUSTERING AND PHYLOGENY
    Ailon, Nir
    Charikar, Moses
    [J]. SIAM JOURNAL ON COMPUTING, 2011, 40 (05) : 1275 - 1291
  • [6] Fitting tree metrics: Hierarchical clustering and phylogeny
    Ailon, N
    Charikar, M
    [J]. 46TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2005, : 73 - 82
  • [7] A COMPARISON OF SIMILARITY METRICS FOR HIERARCHICAL CLUSTERING OF MEDICAL INTERVENTIONS
    Vong, Wan-Tze
    Then, Patrick Hang Hui
    [J]. JP JOURNAL OF BIOSTATISTICS, 2016, 13 (01) : 1 - 27
  • [8] A study on Hierarchical Clustering and the Distance metrics for Identifying Architectural Styles
    Mercioni, Marina Adriana
    Holban, Stefan
    [J]. 2019 INTERNATIONAL CONFERENCE ON ENERGY AND ENVIRONMENT (CIEM), 2019, : 49 - 53
  • [9] A Comparison of Distance Metrics in Semi-supervised Hierarchical Clustering
    Aljohani, Abeer
    Lai, Daphne Teck Ching
    Bell, Paul C.
    Edirisinghe, Eran A.
    [J]. INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2017, PT III, 2017, 10363 : 719 - 731
  • [10] Hierarchical Overlapping Clustering of Network Data Using Cut Metrics
    Gama, Fernando
    Segarra, Santiago
    Ribeiro, Alejandro
    [J]. IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2018, 4 (02): : 392 - 406