A deterministic method for initializing K-means clustering

被引:0
|
作者
Su, T [1 ]
Dy, J [1 ]
机构
[1] Northeastern Univ, Boston, MA 02115 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of K-means clustering depends on the initial guess of partition. In this paper we motivate theoretically and experimentally the use of a deterministic divisive hierarchical method, which we refer to as PCA-Part (Principal Components Analysis Partitioning)for initialization. The criterion that K-means clustering minimizes is the SSE (sum-squared-error) criterion. The first principal direction (the eigenvector corresponding to the largest eigenvalue of the covariance matrix) is the direction which contributes the largest SSE. Hence, a good candidate direction to project a cluster for splitting is, then, the first principal direction. This is the basis for PCA-Part initialization method. Our experiments reveal that generally PCA-Part leads K-means to generate clusters with SSE values close to the minimum SSE values obtained by one hundred random start runs. In addition, this deterministic initialization method often leads K-means to faster convergence (less iterations) compared to random methods. Furthermore, we also theoretically show and confirm experimentally on synthetic data when PCA-Part may fail.
引用
收藏
页码:784 / 786
页数:3
相关论文
共 50 条
  • [31] A new method for initialising the K-means clustering algorithm
    Qing, Xiaoping
    Zheng, Shijue
    2009 SECOND INTERNATIONAL SYMPOSIUM ON KNOWLEDGE ACQUISITION AND MODELING: KAM 2009, VOL 2, 2009, : 41 - 44
  • [32] A Method for selecting initial centers of K-means clustering
    Xiong, Zhibin
    Mou, Jinjun
    Du, Hongyan
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 125 : 147 - 148
  • [33] Initializing k-Means Efficiently: Benefits for Exploratory Cluster Analysis
    Fritz, Manuel
    Schwarz, Holger
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2019 CONFERENCES, 2019, 11877 : 146 - 163
  • [34] Selection of K in K-means clustering
    Pham, DT
    Dimov, SS
    Nguyen, CD
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART C-JOURNAL OF MECHANICAL ENGINEERING SCIENCE, 2005, 219 (01) : 103 - 119
  • [35] Geodesic K-means Clustering
    Asgharbeygi, Nima
    Maleki, Arian
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3450 - 3453
  • [36] Stability of k-means clustering
    Ben-David, Shai
    Pal, Ddvid
    Simon, Hans Ulrich
    LEARNING THEORY, PROCEEDINGS, 2007, 4539 : 20 - +
  • [37] On the Optimality of k-means Clustering
    Dalton, Lori A.
    2013 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS (GENSIPS 2013), 2013, : 70 - 71
  • [38] Transformed K-means Clustering
    Goel, Anurag
    Majumdar, Angshul
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1526 - 1530
  • [39] On autonomous k-means clustering
    Elomaa, T
    Koivistoinen, H
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2005, 3488 : 228 - 236
  • [40] Balanced K-Means for Clustering
    Malinen, Mikko I.
    Franti, Pasi
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2014, 8621 : 32 - 41