Inertia-Based Indices to Determine the Number of Clusters in K-Means: An Experimental Evaluation

被引:4
|
作者
Rykov, Andrei [1 ]
de Amorim, Renato Cordeiro [2 ]
Makarenkov, Vladimir [3 ,4 ]
Mirkin, Boris [1 ,5 ]
机构
[1] Natl Res Univ Higher Sch Econ, Dept Data Anal & Machine Intelligence, Moscow 101000, Russia
[2] Univ Essex, Comp Sci & Elect Engn Dept, Wivenhoe CO4 3SQ, England
[3] Imagia Cybernet, Montreal, PQ H3C 3P8, Canada
[4] Mila Quebec AI Inst, Montreal, PQ H2S 3H1, Canada
[5] Univ London, Dept Comp Sci & Informat Syst, London WC1E 7HX, England
基金
加拿大自然科学与工程研究理事会;
关键词
Indexes; Clustering algorithms; Euclidean distance; Amplitude modulation; Partitioning algorithms; Computer science; K-means; number of clusters; inertia; elbow method; Calinski-Harabasz index; Hartigan rule; ALGORITHM;
D O I
10.1109/ACCESS.2024.3350791
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper gives an experimentally supported review and comparison of several indices based on the conventional K-means inertia criterion for determining the number of clusters, K, in datasets, using the popular Silhouette width index as a benchmark. Our experiments involve a novel version of the Elbow index, defined using values of K two or three steps apart. We also discuss alternative ways of computing the inertia and summarizing its values. Even though there are no overall winners in our experiments, some of our results are very conclusive and can be used as a guide for indices determining the number of clusters in K-means.
引用
收藏
页码:11761 / 11773
页数:13
相关论文
共 50 条
  • [1] Variable Weighting in Fuzzy k-Means Clustering to Determine the Number of Clusters
    Khan, Imran
    Luo, Zongwei
    Huang, Joshua Zhexue
    Shahzad, Waseem
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (09) : 1838 - 1853
  • [2] Experiments for the number of clusters in K-Means
    Chiang, Mark Ming-Tso
    Mirkin, Boris
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4874 : 395 - 405
  • [3] Setting the number of clusters in K-means clustering
    Huh, MH
    [J]. RECENT ADVANCES IN STATISTICAL RESEARCH AND DATA ANALYSIS, 2002, : 115 - 124
  • [4] Choosing the Number of Clusters in K-Means Clustering
    Steinley, Douglas
    Brusco, Michael J.
    [J]. PSYCHOLOGICAL METHODS, 2011, 16 (03) : 285 - 297
  • [5] Automatic estimation of clusters number for K-means
    Sabri, My Abdelouahed
    Ennouni, Assia
    Aarab, Abdellah
    [J]. 2016 4TH IEEE INTERNATIONAL COLLOQUIUM ON INFORMATION SCIENCE AND TECHNOLOGY (CIST), 2016, : 450 - 454
  • [6] A method for determining optimal number of clusters based on K-means algorithm
    Qin, Zhentao
    Yang, Wunian
    [J]. Qin, Z. (qzt2008@sina.com), 1600, Binary Information Press, P.O. Box 162, Bethel, CT 06801-0162, United States (09): : 6123 - 6130
  • [7] Penalized K-Means Algorithms for Finding the Number of Clusters
    Kamgar-Parsi, Behzad
    Kamgar-Parsi, Behrooz
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 969 - 974
  • [8] Determination of the Optimal Number of Clusters in K-Means Algorithm
    He, Xuansen
    He, Fan
    Xu, Li
    Fan, Yueping
    [J]. Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2022, 51 (06): : 904 - 912
  • [9] Evolutionary k-Means Clustering Method with Controlled Number of Clusters Applied to Determine the Typology of Polish Municipalities
    Stanczak, Jaroslaw
    Owsinski, Jan W.
    [J]. UNCERTAINTY AND IMPRECISION IN DECISION MAKING AND DECISION SUPPORT: NEW ADVANCES, CHALLENGES, AND PERSPECTIVES, 2022, 338 : 436 - 446
  • [10] Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads
    Mark Ming-Tso Chiang
    Boris Mirkin
    [J]. Journal of Classification, 2010, 27 : 3 - 40