Choosing the Number of Clusters in K-Means Clustering

被引:70
|
作者
Steinley, Douglas [1 ]
Brusco, Michael J. [2 ]
机构
[1] Univ Missouri, Dept Psychol Sci, Columbia, MO 65203 USA
[2] Florida State Univ, Dept Mkt, Tallahassee, FL 32306 USA
关键词
cluster analysis; K-means clustering; choosing the number of clusters; CROSS-VALIDATION; LOCAL OPTIMA; PSYCHOLOGY; VARIABLES; SELECTION;
D O I
10.1037/a0023346
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Steinley (2007) provided a lower bound for the sum-of-squares error criterion function used in K-means clustering. In this article, on the basis of the lower bound, the authors propose a method to distinguish between 1 cluster (i.e., a single distribution) versus more than 1 cluster. Additionally, conditional on indicating there are multiple clusters, the procedure is extended to determine the number of clusters. Through a series of simulations, the proposed methodology is shown to outperform several other commonly used procedures for determining both the presence of clusters and their number.
引用
收藏
页码:285 / 297
页数:13
相关论文
共 50 条
  • [1] Setting the number of clusters in K-means clustering
    Huh, MH
    [J]. RECENT ADVANCES IN STATISTICAL RESEARCH AND DATA ANALYSIS, 2002, : 115 - 124
  • [2] Variable Weighting in Fuzzy k-Means Clustering to Determine the Number of Clusters
    Khan, Imran
    Luo, Zongwei
    Huang, Joshua Zhexue
    Shahzad, Waseem
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (09) : 1838 - 1853
  • [3] Agglomerative fuzzy K-Means clustering algorithm with selection of number of clusters
    Li, Mark Junjie
    Ng, Michael K.
    Cheung, Yiu-ming
    Huang, Joshua Zhexue
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (11) : 1519 - 1534
  • [4] Experiments for the number of clusters in K-Means
    Chiang, Mark Ming-Tso
    Mirkin, Boris
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4874 : 395 - 405
  • [5] Entropy K-Means Clustering With Feature Reduction Under Unknown Number of Clusters
    Sinaga, Kristina P.
    Hussain, Ishtiaq
    Yang, Miin-Shen
    [J]. IEEE ACCESS, 2021, 9 : 67736 - 67751
  • [6] Automatic estimation of clusters number for K-means
    Sabri, My Abdelouahed
    Ennouni, Assia
    Aarab, Abdellah
    [J]. 2016 4TH IEEE INTERNATIONAL COLLOQUIUM ON INFORMATION SCIENCE AND TECHNOLOGY (CIST), 2016, : 450 - 454
  • [7] An entropy-based initialization method of K-means clustering on the optimal number of clusters
    Kuntal Chowdhury
    Debasis Chaudhuri
    Arup Kumar Pal
    [J]. Neural Computing and Applications, 2021, 33 : 6965 - 6982
  • [8] An entropy-based initialization method of K-means clustering on the optimal number of clusters
    Chowdhury, Kuntal
    Chaudhuri, Debasis
    Pal, Arup Kumar
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (12): : 6965 - 6982
  • [9] Selection of Optimal Number of Clusters and Centroids for K-means and Fuzzy C-means Clustering: A Review
    Pugazhenthi, A.
    Kumar, Lakshmi Sutha
    [J]. PROCEEDINGS OF THE 2020 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS-2020), 2020,
  • [10] k*-means -: A generalized k-means clustering algorithm with unknown cluster number
    Cheung, YM
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2002, 2002, 2412 : 307 - 317