Approximate Hierarchical Clustering via Sparsest Cut and Spreading Metrics

被引:0
|
作者
Charikar, Moses [1 ]
Chatziafratis, Vaggos [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
关键词
MINIMUM LINEAR ARRANGEMENT; ALGORITHMS;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Dasgupta recently introduced a cost function for the hierarchical clustering of a set of points given pairwise similarities between them. He showed that this function is NP-hard to optimize, but a top-down recursive partitioning heuristic based on an alpha(n)-approximation algorithm for uniform sparsest cut gives an approximation of O(alpha(n) log n) (the current best algorithm has alpha(n) = O(root log n)). We show that the aforementioned sparsest cut heuristic in fact obtains an O(alpha(n))-approximation. The algorithm also applies to a generalized cost function studied by Dasgupta. Moreover, we obtain a strong inapproximability result, showing that the Hierarchical Clustering objective is hard to approximate to within any constant factor assuming the Small-Set Expansion (SSE) Hypothesis. Finally, we discuss approximation algorithms based on convex relaxations. We present a spreading metric SDP relaxation for the problem and show that it has integrality gap at most O(root log n). The advantage of the SDP relative to the sparsest cut heuristic is that it provides an explicit lower bound on the optimal solution and could potentially yield an even better approximation for hierarchical clustering. In fact our analysis of this SDP served as the inspiration for our improved analysis of the sparsest cut heuristic. We also show that a spreading metric LP relaxation gives an O(log n)-approximation.
引用
收藏
页码:841 / 854
页数:14
相关论文
共 50 条
  • [1] Hierarchical Clustering via Spreading Metrics
    Roy, Aurko
    Pokutta, Sebastian
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [2] Hierarchical Clustering via Spreading Metrics
    Roy, Aurko
    Pokutta, Sebastian
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [3] Hierarchical Overlapping Clustering of Network Data Using Cut Metrics
    Gama, Fernando
    Segarra, Santiago
    Ribeiro, Alejandro
    [J]. IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2018, 4 (02): : 392 - 406
  • [4] Maximizing Agreements for Ranking, Clustering and Hierarchical Clustering via MAX-CUT
    Chatziafratis, Vaggos
    Mahdian, Mohammad
    Ahmadian, Sara
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [5] Embeddings of Negative-Type Metrics and An Improved Approximation to Generalized Sparsest Cut
    Chawla, Shuchi
    Gupta, Anupam
    Raecke, Harald
    [J]. ACM TRANSACTIONS ON ALGORITHMS, 2008, 4 (02)
  • [6] Embeddings of Negative-type Metrics and An Improved Approximation to Generalized Sparsest Cut
    Chawla, Shuchi
    Gupta, Anupam
    Raecke, Harald
    [J]. PROCEEDINGS OF THE SIXTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2005, : 102 - 111
  • [7] A new metrics for hierarchical clustering
    Yang, GW
    Shi, SM
    Wang, DX
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2003, 12 (04) : 494 - 498
  • [8] Approximating Non-Uniform Sparsest Cut via Generalized Spectra
    Guruswami, Venkatesan
    Sinop, Ali Kemal
    [J]. PROCEEDINGS OF THE TWENTY-FOURTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA 2013), 2013, : 295 - 305
  • [9] Hierarchical Clustering via Sketches and Hierarchical Correlation Clustering
    Vainstein, Danny
    Chatziafratis, Vaggos
    Citovsky, Gui
    Rajagopalan, Anand
    Mahdian, Mohammad
    Azar, Yossi
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 559 - +
  • [10] FITTING TREE METRICS: HIERARCHICAL CLUSTERING AND PHYLOGENY
    Ailon, Nir
    Charikar, Moses
    [J]. SIAM JOURNAL ON COMPUTING, 2011, 40 (05) : 1275 - 1291