k-Means plus plus under approximation stability

被引:13
|
作者
Agarwal, Manu [1 ]
Jaiswal, Ragesh [2 ]
Pal, Arindam [3 ]
机构
[1] IIT Jodhpur, Jodhpur, Rajasthan, India
[2] IIT Delhi, New Delhi, India
[3] TCS Innovat Labs Kolkata, Kolkata, India
关键词
k-means clustering; k-means plus; Approximation stability;
D O I
10.1016/j.tcs.2015.04.030
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
One of the most popular algorithms for finding centers for initializing Lloyd's heuristic is the k-means++ seeding algorithm. The algorithm is a simple sampling procedure that can be described as follows: The algorithm picks the first center randomly from among the given points and then for i = 2, 3,..., k, picks a point to be the ith center with probability proportional to the squared Euclidean distance of this point to the nearest center out of the (i - 1) previously chosen centers. The k-means++ seeding algorithm is known to exhibit nice properties. It has been noticed that this seeding algorithm tends to perform well when the optimal clusters are separated in some sense. Intuitively, this is because the algorithm gives preference to further away points when picking centers. One separation condition that has been studied in the past was due to Ostrovsky et al. [9]. Jaiswal and Garg [8] showed that if any dataset satisfies the separation condition of [9], then this sampling algorithm gives a constant approximation with probability Omega (1/k) on this dataset. Another separation condition that is strictly weaker than [9] is the approximation stability condition studied by Balcan et al. [5]. In this work, we show that the sampling algorithm gives a constant approximation with probability Omega (1/k) on any dataset that satisfies the separation condition of [5] and the optimal k clusters are not too small. We give a negative result for datasets that have small optimal clusters. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:37 / 51
页数:15
相关论文
共 50 条
  • [1] Efficient k-Means plus plus Approximation with MapReduce
    Xu, Yujie
    Qu, Wenyu
    Li, Zhiyang
    Min, Geyong
    Li, Keqiu
    Liu, Zhaobin
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (12) : 3135 - 3144
  • [2] Improved Guarantees for k-means plus plus and k-means plus plus Parallel
    Makarychev, Konstantin
    Reddy, Aravind
    Shan, Liren
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [3] Exact Acceleration of K-Means plus plus and K-Means∥
    Raff, Edward
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2928 - 2935
  • [4] k-means plus plus : Few More Steps Yield Constant Approximation
    Choo, Davin
    Grunau, Christoph
    Portmann, Julian
    Rozhon, Vaclav
    [J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [5] k-means plus plus : Few More Steps Yield Constant Approximation
    Choo, Davin
    Grunau, Christoph
    Portmann, Julian
    Rozhon, Vaclav
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [6] On the Consistency of k-means plus plus algorithm
    Klopotek, Mieczyslaw A.
    [J]. FUNDAMENTA INFORMATICAE, 2020, 172 (04) : 361 - 377
  • [7] Comparison of K-means and K-means plus plus for image compression with thermographic images
    Biswas, Hridoy
    Umbaugh, Scott E.
    Marino, Dominic
    Sackman, Joseph
    [J]. THERMOSENSE: THERMAL INFRARED APPLICATIONS XLIII, 2021, 11743
  • [8] Robust k-means plus
    Deshpande, Amit
    Kacham, Praneeth
    Pratap, Rameshwar
    [J]. CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 799 - 808
  • [9] k-variates plus plus : more pluses in the k-means plus
    Nock, Richard
    Canyasse, Raphael
    Boreli, Roksana
    Nielsen, Frank
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [10] Global k-means plus plus : an effective relaxation of the global k-means clustering algorithm
    Vardakas, Georgios
    Likas, Aristidis
    [J]. APPLIED INTELLIGENCE, 2024, 54 (19) : 8876 - 8888