Estimating the Optimal Number of Clusters from Subsets of Ensembles

被引:1
|
作者
Odebode, Afees Adegoke [1 ]
Tucker, Allan [1 ]
Arzoky, Mahir [1 ]
Swift, Stepehen [1 ]
机构
[1] Brunel Univ, London, England
关键词
Ensemble Clustering; Subset Selection; Cluster Analysis; Number of Clusters; CLASSIFICATION; CRITERION;
D O I
10.5220/0011275000003269
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This research estimates the optimal number of clusters in a dataset using a novel ensemble technique - a preferred alternative to relying on the output of a single clustering. Combining clusterings from different algorithms can lead to a more stable and robust solution, often unattainable by any single clustering solution. Technically, we created subsets of ensembles as possible estimates; and evaluated them using a quality metric to obtain the best subset. We tested our method on publicly available datasets of varying types, sources and clustering difficulty to establish the accuracy and performance of our approach against eight standard methods. Our method outperforms all the techniques in the number of clusters estimated correctly. Due to the exhaustive nature of the initial algorithm, it is slow as the number of ensembles or the solution space increases; hence, we have provided an updated version based on the single-digit difference of Gray code that runs in linear time in terms of the subset size.
引用
收藏
页码:383 / 391
页数:9
相关论文
共 50 条
  • [1] Hesitant Mahalanobis distance with applications to estimating the optimal number of clusters
    Chao, Kun
    Zhao, Hua
    Xu, Zeshui
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (09) : 5264 - 5306
  • [2] Estimating the Optimal Number of Clusters Via Internal Validity Index
    Zhou, Shibing
    Liu, Fei
    Song, Wei
    [J]. NEURAL PROCESSING LETTERS, 2021, 53 (02) : 1013 - 1034
  • [3] Estimating the Optimal Number of Clusters Via Internal Validity Index
    Shibing Zhou
    Fei Liu
    Wei Song
    [J]. Neural Processing Letters, 2021, 53 : 1013 - 1034
  • [4] Estimating the number of clusters
    Cuevas, A
    Febrero, M
    Fraiman, R
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2000, 28 (02): : 367 - 382
  • [5] Estimating the Optimal Number of Clusters k in a Dataset Using Data Depth
    Patil, Channamma
    Baidari, Ishwar
    [J]. DATA SCIENCE AND ENGINEERING, 2019, 4 (02) : 132 - 140
  • [6] Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient
    Dinh, Duy-Tai
    Fujinami, Tsutomu
    Huynh, Van-Nam
    [J]. KNOWLEDGE AND SYSTEMS SCIENCES, KSS 2019, 2019, 1103 : 1 - 17
  • [7] Estimating the Optimal Number of Clusters k in a Dataset Using Data Depth
    Channamma Patil
    Ishwar Baidari
    [J]. Data Science and Engineering, 2019, 4 : 132 - 140
  • [8] A randomized algorithm for estimating the number of clusters
    Granichin, O. N.
    Shalymov, D. S.
    Avros, R.
    Volkovich, Z.
    [J]. AUTOMATION AND REMOTE CONTROL, 2011, 72 (04) : 754 - 765
  • [9] Estimating main effects with Pareto Optimal subsets
    Raghavarao, D
    Wiley, JB
    [J]. AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 1998, 40 (04) : 425 - 432
  • [10] A randomized algorithm for estimating the number of clusters
    O. N. Granichin
    D. S. Shalymov
    R. Avros
    Z. Volkovich
    [J]. Automation and Remote Control, 2011, 72 : 754 - 765