HOW THE INITIALIZATION AFFECTS THE STABILITY OF THE k-MEANS ALGORITHM

被引:27
|
作者
Bubeck, Sebastien [1 ]
Meila, Marina [2 ]
von Luxburg, Ulrike [3 ]
机构
[1] Ctr Recerca Matemat Barcelona, Barcelona, Spain
[2] Univ Washington, Dept Stat, Seattle, WA 98195 USA
[3] Max Planck Inst Biol Cybernet, Tubingen, Germany
关键词
Clustering; k-means; stability; model selection;
D O I
10.1051/ps/2012013
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We investigate the role of the initialization for the stability of the k-means clustering algorithm. As opposed to other papers, we consider the actual k-means algorithm (also known as Lloyd algorithm). In particular we leverage on the property that this algorithm can get stuck in local optima of the k-means objective function. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations lead to the same local optimum, and when they lead to different local optima. This enables us to prove that it is reasonable to select the number of clusters based on stability scores.
引用
收藏
页码:436 / 452
页数:17
相关论文
共 50 条
  • [1] Density K-means : A New Algorithm for Centers Initialization for K-means
    Lan, Xv
    Li, Qian
    Zheng, Yi
    [J]. PROCEEDINGS OF 2015 6TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE, 2015, : 958 - 961
  • [2] Adaptive Initialization Method for K-Means Algorithm
    Yang, Jie
    Wang, Yu-Kai
    Yao, Xin
    Lin, Chin-Teng
    [J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
  • [3] AN INTELLIGENT INITIALIZATION METHOD FOR THE K-MEANS CLUSTERING ALGORITHM
    Sheu, Jyh-Jian
    Chen, Wei-Ming
    Tsai, Wen-Bin
    Chu, Ko-Tsung
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2010, 6 (06): : 2551 - 2566
  • [4] Cluster center initialization algorithm for K-means clustering
    Khan, SS
    Ahmad, A
    [J]. PATTERN RECOGNITION LETTERS, 2004, 25 (11) : 1293 - 1302
  • [5] An empirical comparison of four initialization methods for the K-Means algorithm
    Peña, JM
    Lozano, JA
    Larrañaga, P
    [J]. PATTERN RECOGNITION LETTERS, 1999, 20 (10) : 1027 - 1040
  • [6] An Initialization Method Based on Hybrid Distance for k-Means Algorithm
    Yang, Jie
    Ma, Yan
    Zhang, Xiangfen
    Li, Shunbao
    Zhang, Yuping
    [J]. NEURAL COMPUTATION, 2017, 29 (11) : 3094 - 3117
  • [7] DETERMINISTIC INITIALIZATION OF THE K-MEANS ALGORITHM USING HIERARCHICAL CLUSTERING
    Celebi, M. Emre
    Kingravi, Hassan A.
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2012, 26 (07)
  • [8] AN EFFICIENT K-MEANS CLUSTERING INITIALIZATION USING OPTIMIZATION ALGORITHM
    Divya, V.
    Deepika, R.
    Yamini, C.
    Sobiyaa, P.
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATION ENGINEERING (ICACCE-2019), 2019,
  • [9] A New Projection-based K-Means Initialization Algorithm
    Du, Wei
    Lin, Hu
    Sun, Jianwei
    Yu, Bo
    Yang, Haibo
    [J]. 2016 IEEE CHINESE GUIDANCE, NAVIGATION AND CONTROL CONFERENCE (CGNCC), 2016, : 2341 - 2345
  • [10] AN INITIALIZATION METHOD OF K-MEANS CLUSTERING ALGORITHM FOR MIXED DATA
    Li, Taoying
    Jin, Zhihong
    Chen, Yan
    Ebonzo, Angelo Dan Menga
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2014, 10 (05): : 1873 - 1883