In search of deterministic methods for initializing K-means and Gaussian mixture clustering

被引:81
|
作者
Su, Ting [1 ]
Dy, Jennifer G. [1 ]
机构
[1] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA
关键词
K-means; Gaussian mixture; initialization; PCA; clustering;
D O I
10.3233/IDA-2007-11402
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of K-means and Gaussian mixture model (GMM) clustering depends on the initial guess of partitions. Typically, clustering algorithms are initialized by random starts. In our search for a deterministic method, we found two promising approaches: principal component analysis (PCA) partitioning and Var-Part (Variance Partitioning). K-means clustering tries to minimize the sum-squared-error criterion. The largest eigenvector with the largest eigenvalue is the component which contributes to the largest sum-squared-error. Hence, a good candidate direction to project a cluster for splitting is the direction of the cluster's largest eigenvector, the basis for PCA partitioning. Similarly, GMM clustering maximizes the likelihood; minimizing the determinant of the covariance matrices of each cluster helps to increase the likelihood. The largest eigenvector contributes to the largest determinant and is thus a good candidate direction for splitting. However, PCA is computationally expensive. We, thus, introduce Var-Part, which is computationally less complex (with complexity equal to one K-means iteration) and approximates PCA partitioning assuming diagonal covariance matrix. Experiments reveal that Var-Part has similar performance with PCA partitioning, sometimes better, and leads K-means (and GMM) to yield sum-squared-error (and maximum-likelihood) values close to the optimum values obtained by several random-start runs and often at faster convergence rates.
引用
收藏
页码:319 / 338
页数:20
相关论文
共 50 条
  • [1] A deterministic method for initializing K-means clustering
    Su, T
    Dy, J
    [J]. ICTAI 2004: 16TH IEEE INTERNATIONALCONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, : 784 - 786
  • [2] Initializing k-means Clustering by Bootstrap and Data Depth
    Aurora Torrente
    Juan Romo
    [J]. Journal of Classification, 2021, 38 : 232 - 256
  • [3] Initializing K-means Clustering Using Affinity Propagation
    Zhu, Yan
    Yu, Jian
    Jia, Caiyan
    [J]. HIS 2009: 2009 NINTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, VOL 1, PROCEEDINGS, 2009, : 338 - 343
  • [4] A novel approach for initializing the spherical K-means clustering algorithm
    Duwairi, Rehab
    Abu-Rahmeh, Mohammed
    [J]. SIMULATION MODELLING PRACTICE AND THEORY, 2015, 54 : 49 - 63
  • [5] Deterministic Feature Selection for k-Means Clustering
    Boutsidis, Christos
    Magdon-Ismail, Malik
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (09) : 6099 - 6110
  • [6] An Empirical Study on Initializing Centroid in K-Means Clustering for Feature Selection
    Saxena, Amit
    Wang, John
    Sintunavarat, Wutiphol
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE SCIENCE AND COMPUTATIONAL INTELLIGENCE-IJSSCI, 2021, 13 (01): : 1 - 16
  • [7] Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques
    Douglas Steinley
    Michael J. Brusco
    [J]. Journal of Classification, 2007, 24 : 99 - 121
  • [8] Initializing K-means batch clustering:: A critical evaluation of several techniques
    Steinley, Douglas
    Brusco, Michael J.
    [J]. JOURNAL OF CLASSIFICATION, 2007, 24 (01) : 99 - 121
  • [9] Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering
    de Amorim, Renato Cordeiro
    Mirkin, Boris
    [J]. PATTERN RECOGNITION, 2012, 45 (03) : 1061 - 1075
  • [10] K-means and gaussian mixture modeling with a separation constraint
    Jiang, He
    Arias-Castro, Ery
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024,