Stability yields a PTAS for k-Median and k-Means Clustering

被引:39
|
作者
Awasthi, Pranjal [1 ]
Blum, Avrim [1 ]
Sheffet, Or [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/FOCS.2010.36
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We consider k-median clustering in finite metric spaces and k-means clustering in Euclidean spaces, in the setting where k is part of the input (not a constant). For the k-means problem, Ostrovsky et al. [18] show that if the optimal (k - 1)-means clustering of the input is more expensive than the optimal k-means clustering by a factor of 1/epsilon(2), then one can achieve a (1 + f(epsilon))-approximation to the k-means optimal in time polynomial in n and k by using a variant of Lloyd's algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the (k - 1)-means optimal is more expensive than the k-means optimal by a factor 1 + alpha for some constant alpha > 0, we can obtain a PTAS. In particular, under this assumption, for any epsilon > 0 we achieve a (1 + epsilon)-approximation to the k-means optimal in time polynomial in n and k, and exponential in 1/epsilon and 1/alpha. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the k-median problem in finite metrics under the analogous assumption as well. For k-means, we in addition give a randomized algorithm with improved running time of n(O(1))(k log n)(poly(1/epsilon,) (1/alpha)). Our technique also obtains a PTAS under the assumption of Balcan et al. [4] that all (1 + alpha) approximations are delta-close to a desired target clustering, in the case that all target clusters have size greater than delta n and alpha > 0 is constant. Note that the motivation of Balcan et al. [4] is that for many clustering problems, the objective function is only a proxy for the true goal of getting close to the target. From this perspective, our improvement is that for k-means in Euclidean spaces we reduce the distance of the clustering found to the target from O(delta) to delta when all target clusters are large, and for k-median we improve the "largeness" condition needed in [4] to get exactly delta-close from O(delta n) to delta n. Our results are based on a new notion of clustering stability.
引用
收藏
页码:309 / 318
页数:10
相关论文
共 50 条
  • [21] Achieving Anonymity via Weak Lower Bound Constraints for k-Median and k-Means
    Arutyunova, Anna
    Schmidt, Melanie
    38TH INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2021), 2021, 187
  • [22] Johnson Coverage Hypothesis: Inapproximability of k-means and k-median in lp-metrics
    Cohen-Addad, Vincent
    Karthik, C. S.
    Lee, Euiwoong
    PROCEEDINGS OF THE 2022 ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA, 2022, : 1493 - 1530
  • [23] On k-Median Clustering in High Dimensions
    Chen, Ke
    PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 1177 - 1185
  • [24] Stability analysis in K-means clustering
    Steinley, Douglas
    BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2008, 61 : 255 - 273
  • [25] A notion of stability for k-means clustering
    Le Gouic, T.
    Paris, Q.
    ELECTRONIC JOURNAL OF STATISTICS, 2018, 12 (02): : 4239 - 4263
  • [26] Probabilistic k-Median Clustering in Data Streams
    Christiane Lammersen
    Melanie Schmidt
    Christian Sohler
    Theory of Computing Systems, 2015, 56 : 251 - 290
  • [27] Improved approximations for Euclidean k-means and k-median, via nested quasi-independent sets
    Cohen-Addad, Vincent
    Esfandiari, Hossein
    Mirrokni, Vahab
    Narayanan, Shyam
    Proceedings of the Annual ACM Symposium on Theory of Computing, 2022, : 1621 - 1628
  • [28] Stability and model selection in k-means clustering
    Ohad Shamir
    Naftali Tishby
    Machine Learning, 2010, 80 : 213 - 243
  • [29] Improved PTAS for the constrained k-means problem
    Qilong Feng
    Jiaxin Hu
    Neng Huang
    Jianxin Wang
    Journal of Combinatorial Optimization, 2019, 37 : 1091 - 1110
  • [30] Private k-Means Clustering with Stability Assumptions
    Shechner, Moshe
    Sheffet, Or
    Stemmer, Uri
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108