A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering

被引:1
|
作者
Shai Ben-David
机构
[1] University of Waterloo,School of Computer Science
来源
Machine Learning | 2007年 / 66卷
关键词
-means clustering; -median clustering; Sample-based clustering; Approximation algorithms; Description schemes;
D O I
暂无
中图分类号
学科分类号
摘要
We consider a framework of sample-based clustering. In this setting, the input to a clustering algorithm is a sample generated i.i.d by some unknown arbitrary distribution. Based on such a sample, the algorithm has to output a clustering of the full domain set, that is evaluated with respect to the underlying distribution. We provide general conditions on clustering problems that imply the existence of sampling based clustering algorithms that approximate the optimal clustering. We show that the K-median clustering, as well as K-means and the Vector Quantization problems, satisfy these conditions. Our results apply to the combinatorial optimization setting where, assuming that sampling uniformly over an input set can be done in constant time, we get a sampling-based algorithm for the K-median and K-means clustering problems that finds an almost optimal set of centers in time depending only on the confidence and accuracy parameters of the approximation, but independent of the input size. Furthermore, in the Euclidean input case, the dependence of the running time of our algorithm on the Euclidean dimension is only linear. Our main technical tool is a uniform convergence result for center based clustering that can be viewed as showing that the effective VC-dimension of k-center clustering equals k.
引用
收藏
页码:243 / 257
页数:14
相关论文
共 50 条
  • [1] A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering
    Ben-David, Shai
    [J]. MACHINE LEARNING, 2007, 66 (2-3) : 243 - 257
  • [2] A framework for statistical clustering with a constant time approximation algorithms for K-median clustering
    Ben-David, S
    [J]. LEARNING THEORY, PROCEEDINGS, 2004, 3120 : 415 - 426
  • [3] Robust K-Median and K-Means Clustering Algorithms for Incomplete Data
    Li, Jinhua
    Song, Shiji
    Zhang, Yuli
    Zhou, Zhen
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
  • [4] Smaller Coresets for k-Median and k-Means Clustering
    Sariel Har-Peled
    Akash Kushal
    [J]. Discrete & Computational Geometry, 2007, 37 : 3 - 19
  • [5] Smaller coresets for k-median and k-means clustering
    Har-Peled, Sariel
    Kushal, Akash
    [J]. DISCRETE & COMPUTATIONAL GEOMETRY, 2007, 37 (01) : 3 - 19
  • [6] Stability yields a PTAS for k-Median and k-Means Clustering
    Awasthi, Pranjal
    Blum, Avrim
    Sheffet, Or
    [J]. 2010 IEEE 51ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 2010, : 309 - 318
  • [7] Outlier Detection using Clustering Techniques - K-means and K-median
    Angelin, B.
    Geetha, A.
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS 2020), 2020, : 373 - 378
  • [8] A Constant Factor Approximation Algorithm for k-Median Clustering with Outliers
    Chen, Ke
    [J]. PROCEEDINGS OF THE NINETEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2008, : 826 - 835
  • [9] ON CORESETS FOR k-MEDIAN AND k-MEANS CLUSTERING IN METRIC AND EUCLIDEAN SPACES AND THEIR APPLICATIONS
    Chen, Ke
    [J]. SIAM JOURNAL ON COMPUTING, 2009, 39 (03) : 923 - 947
  • [10] Constant Approximation for k-Median and k-Means with Outliers via Iterative Rounding
    Krishnaswamy, Ravishankar
    Li, Shi
    Sandeep, Sai
    [J]. STOC'18: PROCEEDINGS OF THE 50TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2018, : 646 - 659