A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering

被引:1
|
作者
Shai Ben-David
机构
[1] University of Waterloo,School of Computer Science
来源
Machine Learning | 2007年 / 66卷
关键词
-means clustering; -median clustering; Sample-based clustering; Approximation algorithms; Description schemes;
D O I
暂无
中图分类号
学科分类号
摘要
We consider a framework of sample-based clustering. In this setting, the input to a clustering algorithm is a sample generated i.i.d by some unknown arbitrary distribution. Based on such a sample, the algorithm has to output a clustering of the full domain set, that is evaluated with respect to the underlying distribution. We provide general conditions on clustering problems that imply the existence of sampling based clustering algorithms that approximate the optimal clustering. We show that the K-median clustering, as well as K-means and the Vector Quantization problems, satisfy these conditions. Our results apply to the combinatorial optimization setting where, assuming that sampling uniformly over an input set can be done in constant time, we get a sampling-based algorithm for the K-median and K-means clustering problems that finds an almost optimal set of centers in time depending only on the confidence and accuracy parameters of the approximation, but independent of the input size. Furthermore, in the Euclidean input case, the dependence of the running time of our algorithm on the Euclidean dimension is only linear. Our main technical tool is a uniform convergence result for center based clustering that can be viewed as showing that the effective VC-dimension of k-center clustering equals k.
引用
收藏
页码:243 / 257
页数:14
相关论文
共 50 条
  • [41] A practical comparison of two K-Means clustering algorithms
    Gregory A Wilkin
    Xiuzhen Huang
    [J]. BMC Bioinformatics, 9
  • [42] Improving K-means clustering with enhanced Firefly Algorithms
    Xie, Hailun
    Zhang, Li
    Lim, Chee Peng
    Yu, Yonghong
    Liu, Chengyu
    Liu, Han
    Walters, Julie
    [J]. APPLIED SOFT COMPUTING, 2019, 84
  • [43] An Enhanced K-Means Genetic Algorithms for Optimal Clustering
    Anusha, M.
    Sathiaseelan, J. G. R.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC), 2014, : 580 - 584
  • [44] Algorithms for K-means Clustering Problem with Balancing Constraint
    Wang Shouqiang
    Chi Zengxiao
    Zhan Sheng
    [J]. CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, : 3967 - 3972
  • [45] Empirical Evaluation of K-Means, Bisecting K-Means, Fuzzy C-Means and Genetic K-Means Clustering Algorithms
    Banerjee, Shreya
    Choudhary, Ankit
    Pal, Somnath
    [J]. 2015 IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE), 2015, : 172 - 176
  • [46] Constant-Factor Approximation for Ordered k-Median
    Byrka, Jaroslaw
    Sornat, Krzysztof
    Spoerhase, Joachim
    [J]. STOC'18: PROCEEDINGS OF THE 50TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2018, : 620 - 631
  • [47] Towards Optimal Lower Bounds for k-Median and k-Means Coresets
    Cohen-Addad, Vincent
    Larsen, Kasper Green
    Saulpic, David
    Schwiegelshohn, Chris
    [J]. PROCEEDINGS OF THE 54TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '22), 2022, : 1038 - 1051
  • [48] Stability of k-means clustering
    Ben-David, Shai
    Pal, Ddvid
    Simon, Hans Ulrich
    [J]. LEARNING THEORY, PROCEEDINGS, 2007, 4539 : 20 - +
  • [49] Geodesic K-means Clustering
    Asgharbeygi, Nima
    Maleki, Arian
    [J]. 19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3450 - 3453
  • [50] A General Framework for Dimensionality Reduction of K-Means Clustering
    Wu, Tong
    Xiao, Yanni
    Guo, Muhan
    Nie, Feiping
    [J]. JOURNAL OF CLASSIFICATION, 2020, 37 (03) : 616 - 631