In search of deterministic methods for initializing K-means and Gaussian mixture clustering

被引:81
|
作者
Su, Ting [1 ]
Dy, Jennifer G. [1 ]
机构
[1] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA
关键词
K-means; Gaussian mixture; initialization; PCA; clustering;
D O I
10.3233/IDA-2007-11402
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of K-means and Gaussian mixture model (GMM) clustering depends on the initial guess of partitions. Typically, clustering algorithms are initialized by random starts. In our search for a deterministic method, we found two promising approaches: principal component analysis (PCA) partitioning and Var-Part (Variance Partitioning). K-means clustering tries to minimize the sum-squared-error criterion. The largest eigenvector with the largest eigenvalue is the component which contributes to the largest sum-squared-error. Hence, a good candidate direction to project a cluster for splitting is the direction of the cluster's largest eigenvector, the basis for PCA partitioning. Similarly, GMM clustering maximizes the likelihood; minimizing the determinant of the covariance matrices of each cluster helps to increase the likelihood. The largest eigenvector contributes to the largest determinant and is thus a good candidate direction for splitting. However, PCA is computationally expensive. We, thus, introduce Var-Part, which is computationally less complex (with complexity equal to one K-means iteration) and approximates PCA partitioning assuming diagonal covariance matrix. Experiments reveal that Var-Part has similar performance with PCA partitioning, sometimes better, and leads K-means (and GMM) to yield sum-squared-error (and maximum-likelihood) values close to the optimum values obtained by several random-start runs and often at faster convergence rates.
引用
收藏
页码:319 / 338
页数:20
相关论文
共 50 条
  • [41] Transformed K-means Clustering
    Goel, Anurag
    Majumdar, Angshul
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1526 - 1530
  • [42] Balanced K-Means for Clustering
    Malinen, Mikko I.
    Franti, Pasi
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2014, 8621 : 32 - 41
  • [43] Discriminative k-Means Clustering
    Arandjelovic, Ognjen
    [J]. 2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [44] Spherical k-Means Clustering
    Hornik, Kurt
    Feinerer, Ingo
    Kober, Martin
    Buchta, Christian
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2012, 50 (10): : 1 - 22
  • [45] Subspace K-means clustering
    Timmerman, Marieke E.
    Ceulemans, Eva
    De Roover, Kim
    Van Leeuwen, Karla
    [J]. BEHAVIOR RESEARCH METHODS, 2013, 45 (04) : 1011 - 1023
  • [46] K-Means Clustering Explained
    Emerson, Robert Wall
    [J]. JOURNAL OF VISUAL IMPAIRMENT & BLINDNESS, 2024, 118 (01) : 65 - 66
  • [47] Power k-Means Clustering
    Xu, Jason
    Lange, Kenneth
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [48] Subspace K-means clustering
    Marieke E. Timmerman
    Eva Ceulemans
    Kim De Roover
    Karla Van Leeuwen
    [J]. Behavior Research Methods, 2013, 45 : 1011 - 1023
  • [49] k-means clustering of extremes
    Janssen, Anja
    Wan, Phyllis
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2020, 14 (01): : 1211 - 1233
  • [50] K-means clustering on CGRA
    Lopes, Joao D.
    de Sousa, Jose T.
    Neto, Horacio
    Vestias, Mario
    [J]. 2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,