A deterministic method for initializing K-means clustering

被引:0
|
作者
Su, T [1 ]
Dy, J [1 ]
机构
[1] Northeastern Univ, Boston, MA 02115 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of K-means clustering depends on the initial guess of partition. In this paper we motivate theoretically and experimentally the use of a deterministic divisive hierarchical method, which we refer to as PCA-Part (Principal Components Analysis Partitioning)for initialization. The criterion that K-means clustering minimizes is the SSE (sum-squared-error) criterion. The first principal direction (the eigenvector corresponding to the largest eigenvalue of the covariance matrix) is the direction which contributes the largest SSE. Hence, a good candidate direction to project a cluster for splitting is, then, the first principal direction. This is the basis for PCA-Part initialization method. Our experiments reveal that generally PCA-Part leads K-means to generate clusters with SSE values close to the minimum SSE values obtained by one hundred random start runs. In addition, this deterministic initialization method often leads K-means to faster convergence (less iterations) compared to random methods. Furthermore, we also theoretically show and confirm experimentally on synthetic data when PCA-Part may fail.
引用
收藏
页码:784 / 786
页数:3
相关论文
共 50 条
  • [1] In search of deterministic methods for initializing K-means and Gaussian mixture clustering
    Su, Ting
    Dy, Jennifer G.
    INTELLIGENT DATA ANALYSIS, 2007, 11 (04) : 319 - 338
  • [2] Initializing k-means Clustering by Bootstrap and Data Depth
    Aurora Torrente
    Juan Romo
    Journal of Classification, 2021, 38 : 232 - 256
  • [3] Initializing K-means Clustering Using Affinity Propagation
    Zhu, Yan
    Yu, Jian
    Jia, Caiyan
    HIS 2009: 2009 NINTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, VOL 1, PROCEEDINGS, 2009, : 338 - 343
  • [4] A novel approach for initializing the spherical K-means clustering algorithm
    Duwairi, Rehab
    Abu-Rahmeh, Mohammed
    SIMULATION MODELLING PRACTICE AND THEORY, 2015, 54 : 49 - 63
  • [5] Deterministic Feature Selection for k-Means Clustering
    Boutsidis, Christos
    Magdon-Ismail, Malik
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (09) : 6099 - 6110
  • [6] Improved Fuzzy Art Method for Initializing K-means
    Ilhan, Sevinc
    Duru, Nevcihan
    Adali, Esref
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2010, 3 (03) : 274 - 279
  • [7] Improved fuzzy art method for initializing K-means
    Ilhan S.
    Duru N.
    Adali E.
    International Journal of Computational Intelligence Systems, 2010, 3 (3) : 274 - 279
  • [8] An Empirical Study on Initializing Centroid in K-Means Clustering for Feature Selection
    Saxena, Amit
    Wang, John
    Sintunavarat, Wutiphol
    INTERNATIONAL JOURNAL OF SOFTWARE SCIENCE AND COMPUTATIONAL INTELLIGENCE-IJSSCI, 2021, 13 (01): : 1 - 16
  • [9] Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques
    Douglas Steinley
    Michael J. Brusco
    Journal of Classification, 2007, 24 : 99 - 121
  • [10] Initializing K-means batch clustering:: A critical evaluation of several techniques
    Steinley, Douglas
    Brusco, Michael J.
    JOURNAL OF CLASSIFICATION, 2007, 24 (01) : 99 - 121