Stability yields a PTAS for k-Median and k-Means Clustering

被引:39
|
作者
Awasthi, Pranjal [1 ]
Blum, Avrim [1 ]
Sheffet, Or [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/FOCS.2010.36
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We consider k-median clustering in finite metric spaces and k-means clustering in Euclidean spaces, in the setting where k is part of the input (not a constant). For the k-means problem, Ostrovsky et al. [18] show that if the optimal (k - 1)-means clustering of the input is more expensive than the optimal k-means clustering by a factor of 1/epsilon(2), then one can achieve a (1 + f(epsilon))-approximation to the k-means optimal in time polynomial in n and k by using a variant of Lloyd's algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the (k - 1)-means optimal is more expensive than the k-means optimal by a factor 1 + alpha for some constant alpha > 0, we can obtain a PTAS. In particular, under this assumption, for any epsilon > 0 we achieve a (1 + epsilon)-approximation to the k-means optimal in time polynomial in n and k, and exponential in 1/epsilon and 1/alpha. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the k-median problem in finite metrics under the analogous assumption as well. For k-means, we in addition give a randomized algorithm with improved running time of n(O(1))(k log n)(poly(1/epsilon,) (1/alpha)). Our technique also obtains a PTAS under the assumption of Balcan et al. [4] that all (1 + alpha) approximations are delta-close to a desired target clustering, in the case that all target clusters have size greater than delta n and alpha > 0 is constant. Note that the motivation of Balcan et al. [4] is that for many clustering problems, the objective function is only a proxy for the true goal of getting close to the target. From this perspective, our improvement is that for k-means in Euclidean spaces we reduce the distance of the clustering found to the target from O(delta) to delta when all target clusters are large, and for k-median we improve the "largeness" condition needed in [4] to get exactly delta-close from O(delta n) to delta n. Our results are based on a new notion of clustering stability.
引用
收藏
页码:309 / 318
页数:10
相关论文
共 50 条
  • [1] Smaller Coresets for k-Median and k-Means Clustering
    Sariel Har-Peled
    Akash Kushal
    Discrete & Computational Geometry, 2007, 37 : 3 - 19
  • [2] Smaller coresets for k-median and k-means clustering
    Har-Peled, Sariel
    Kushal, Akash
    DISCRETE & COMPUTATIONAL GEOMETRY, 2007, 37 (01) : 3 - 19
  • [3] Robust K-Median and K-Means Clustering Algorithms for Incomplete Data
    Li, Jinhua
    Song, Shiji
    Zhang, Yuli
    Zhou, Zhen
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
  • [4] Outlier Detection using Clustering Techniques - K-means and K-median
    Angelin, B.
    Geetha, A.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS 2020), 2020, : 373 - 378
  • [5] ON CORESETS FOR k-MEDIAN AND k-MEANS CLUSTERING IN METRIC AND EUCLIDEAN SPACES AND THEIR APPLICATIONS
    Chen, Ke
    SIAM JOURNAL ON COMPUTING, 2009, 39 (03) : 923 - 947
  • [6] A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering
    Shai Ben-David
    Machine Learning, 2007, 66 : 243 - 257
  • [7] A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering
    Ben-David, Shai
    MACHINE LEARNING, 2007, 66 (2-3) : 243 - 257
  • [8] A Comparison of Latent Class, K-Means, and K-Median Methods for Clustering Dichotomous Data
    Brusco, Michael J.
    Shireman, Emilie
    Steinley, Douglas
    PSYCHOLOGICAL METHODS, 2017, 22 (03) : 563 - 580
  • [9] Towards Optimal Lower Bounds for k-Median and k-Means Coresets
    Cohen-Addad, Vincent
    Larsen, Kasper Green
    Saulpic, David
    Schwiegelshohn, Chris
    PROCEEDINGS OF THE 54TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '22), 2022, : 1038 - 1051
  • [10] LOCAL SEARCH YIELDS APPROXIMATION SCHEMES FOR k-MEANS AND k-MEDIAN IN EUCLIDEAN AND MINOR-FREE METRICS
    Cohen-Addad, Vincent
    Klein, Philip N.
    Mathieu, Claire
    SIAM JOURNAL ON COMPUTING, 2019, 48 (02) : 644 - 667