Stability yields a PTAS for k-Median and k-Means Clustering

被引：39

作者：

Awasthi, Pranjal ^{[1
]}

Blum, Avrim ^{[1
]}

Sheffet, Or ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

2010 IEEE 51ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE | 2010年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1109/FOCS.2010.36

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

We consider k-median clustering in finite metric spaces and k-means clustering in Euclidean spaces, in the setting where k is part of the input (not a constant). For the k-means problem, Ostrovsky et al. [18] show that if the optimal (k - 1)-means clustering of the input is more expensive than the optimal k-means clustering by a factor of 1/epsilon(2), then one can achieve a (1 + f(epsilon))-approximation to the k-means optimal in time polynomial in n and k by using a variant of Lloyd's algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the (k - 1)-means optimal is more expensive than the k-means optimal by a factor 1 + alpha for some constant alpha > 0, we can obtain a PTAS. In particular, under this assumption, for any epsilon > 0 we achieve a (1 + epsilon)-approximation to the k-means optimal in time polynomial in n and k, and exponential in 1/epsilon and 1/alpha. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the k-median problem in finite metrics under the analogous assumption as well. For k-means, we in addition give a randomized algorithm with improved running time of n(O(1))(k log n)(poly(1/epsilon,) (1/alpha)). Our technique also obtains a PTAS under the assumption of Balcan et al. [4] that all (1 + alpha) approximations are delta-close to a desired target clustering, in the case that all target clusters have size greater than delta n and alpha > 0 is constant. Note that the motivation of Balcan et al. [4] is that for many clustering problems, the objective function is only a proxy for the true goal of getting close to the target. From this perspective, our improvement is that for k-means in Euclidean spaces we reduce the distance of the clustering found to the target from O(delta) to delta when all target clusters are large, and for k-median we improve the "largeness" condition needed in [4] to get exactly delta-close from O(delta n) to delta n. Our results are based on a new notion of clustering stability.

引用

页码：309 / 318

页数：10

共 50 条

[1] Smaller Coresets for k-Median and k-Means Clustering
Sariel Har-Peled
Akash Kushal
Discrete & Computational Geometry, 2007, 37 : 3 - 19
[2] Smaller coresets for k-median and k-means clustering
Har-Peled, Sariel
Kushal, Akash
DISCRETE & COMPUTATIONAL GEOMETRY, 2007, 37 (01) : 3 - 19
[3] Robust K-Median and K-Means Clustering Algorithms for Incomplete Data
Li, Jinhua
Song, Shiji
Zhang, Yuli
Zhou, Zhen
MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
[4] Outlier Detection using Clustering Techniques - K-means and K-median
Angelin, B.
Geetha, A.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS 2020), 2020, : 373 - 378
[5] ON CORESETS FOR k-MEDIAN AND k-MEANS CLUSTERING IN METRIC AND EUCLIDEAN SPACES AND THEIR APPLICATIONS
Chen, Ke
SIAM JOURNAL ON COMPUTING, 2009, 39 (03) : 923 - 947
[6] A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering
Shai Ben-David
Machine Learning, 2007, 66 : 243 - 257
[7] A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering
Ben-David, Shai
MACHINE LEARNING, 2007, 66 (2-3) : 243 - 257
[8] A Comparison of Latent Class, K-Means, and K-Median Methods for Clustering Dichotomous Data
Brusco, Michael J.
Shireman, Emilie
Steinley, Douglas
PSYCHOLOGICAL METHODS, 2017, 22 (03) : 563 - 580
[9] Towards Optimal Lower Bounds for k-Median and k-Means Coresets
Cohen-Addad, Vincent
Larsen, Kasper Green
Saulpic, David
Schwiegelshohn, Chris
PROCEEDINGS OF THE 54TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '22), 2022, : 1038 - 1051
[10] LOCAL SEARCH YIELDS APPROXIMATION SCHEMES FOR k-MEANS AND k-MEDIAN IN EUCLIDEAN AND MINOR-FREE METRICS
Cohen-Addad, Vincent
Klein, Philip N.
Mathieu, Claire
SIAM JOURNAL ON COMPUTING, 2019, 48 (02) : 644 - 667

← 1 2 3 4 5 →