Choosing the Number of Clusters in K-Means Clustering

被引：70

作者：

Steinley, Douglas ^{[1
]}

Brusco, Michael J. ^{[2
]}

机构：

[1] Univ Missouri, Dept Psychol Sci, Columbia, MO 65203 USA

[2] Florida State Univ, Dept Mkt, Tallahassee, FL 32306 USA

来源：

PSYCHOLOGICAL METHODS | 2011年 / 16卷 / 03期

关键词：

cluster analysis; K-means clustering; choosing the number of clusters; CROSS-VALIDATION; LOCAL OPTIMA; PSYCHOLOGY; VARIABLES; SELECTION;

D O I：

10.1037/a0023346

中图分类号：

B84 [心理学];

学科分类号：

04 ; 0402 ;

摘要：

Steinley (2007) provided a lower bound for the sum-of-squares error criterion function used in K-means clustering. In this article, on the basis of the lower bound, the authors propose a method to distinguish between 1 cluster (i.e., a single distribution) versus more than 1 cluster. Additionally, conditional on indicating there are multiple clusters, the procedure is extended to determine the number of clusters. Through a series of simulations, the proposed methodology is shown to outperform several other commonly used procedures for determining both the presence of clusters and their number.

引用

页码：285 / 297

页数：13

共 50 条

[1] Setting the number of clusters in K-means clustering
Huh, MH
[J]. RECENT ADVANCES IN STATISTICAL RESEARCH AND DATA ANALYSIS, 2002, : 115 - 124
[2] Variable Weighting in Fuzzy k-Means Clustering to Determine the Number of Clusters
Khan, Imran
Luo, Zongwei
Huang, Joshua Zhexue
Shahzad, Waseem
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (09) : 1838 - 1853
[3] Agglomerative fuzzy K-Means clustering algorithm with selection of number of clusters
Li, Mark Junjie
Ng, Michael K.
Cheung, Yiu-ming
Huang, Joshua Zhexue
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (11) : 1519 - 1534
[4] Experiments for the number of clusters in K-Means
Chiang, Mark Ming-Tso
Mirkin, Boris
[J]. PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4874 : 395 - 405
[5] Entropy K-Means Clustering With Feature Reduction Under Unknown Number of Clusters
Sinaga, Kristina P.
Hussain, Ishtiaq
Yang, Miin-Shen
[J]. IEEE ACCESS, 2021, 9 : 67736 - 67751
[6] Automatic estimation of clusters number for K-means
Sabri, My Abdelouahed
Ennouni, Assia
Aarab, Abdellah
[J]. 2016 4TH IEEE INTERNATIONAL COLLOQUIUM ON INFORMATION SCIENCE AND TECHNOLOGY (CIST), 2016, : 450 - 454
[7] An entropy-based initialization method of K-means clustering on the optimal number of clusters
Kuntal Chowdhury
Debasis Chaudhuri
Arup Kumar Pal
[J]. Neural Computing and Applications, 2021, 33 : 6965 - 6982
[8] An entropy-based initialization method of K-means clustering on the optimal number of clusters
Chowdhury, Kuntal
Chaudhuri, Debasis
Pal, Arup Kumar
[J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (12): : 6965 - 6982
[9] Selection of Optimal Number of Clusters and Centroids for K-means and Fuzzy C-means Clustering: A Review
Pugazhenthi, A.
Kumar, Lakshmi Sutha
[J]. PROCEEDINGS OF THE 2020 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS-2020), 2020,
[10] k*-means -: A generalized k-means clustering algorithm with unknown cluster number
Cheung, YM
[J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2002, 2002, 2412 : 307 - 317

← 1 2 3 4 5 →