Estimating the number of clusters in a dataset via consensus clustering

被引:79
|
作者
Unlu, Ramazan [1 ]
Xanthopoulos, Petros [2 ]
机构
[1] Gumushane Univ, Dept Management & Informat Syst, Gumushanevi Kampusu,Baglarbasi Mahallesi, TR-29100 Gumushane, Turkey
[2] Stetson Univ, Sch Business Adm, Decis & Informat Sci Dept, 421 N Woodland Blvd, Deland, FL 32723 USA
关键词
Weighted consensus clustering; Validity indices; Number of clusters; MICROARRAY DATA; SELECTION; VALIDATION;
D O I
10.1016/j.eswa.2019.01.074
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In unsupervised learning, the problem of finding the appropriate number of clusters-usually notated as k- is very challenging. Its importance lies in the fact that k is a vital hyperparameter for the most clustering algorithms. One algorithmic approach for tacking this problem is to apply a certain clustering algorithm with various cluster configurations and decide to use the one that maximizes a certain internal validity measure. This is a promising and computationally efficient approach since the independent runs are parallelizable. In this paper, we attempt to improve over this estimation approach by incorporating a consensus clustering approach into k estimating scheme. The weighted consensus clustering scheme employs four different indices namely Silhouette (SH), Calinski-Harabasz (CH), Davies-Bouldin (DB), and Consensus (CI) indices to estimate the correct number of cluster. Computational experiments in a dataset with clusters ranging from 2 to 7 show the profound advantages of weighted consensus clustering for correctly finding k in comparison to individual clustering method (e.g, k-means) and simple consensus clustering. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:33 / 39
页数:7
相关论文
共 50 条
  • [31] A randomized algorithm for estimating the number of clusters
    O. N. Granichin
    D. S. Shalymov
    R. Avros
    Z. Volkovich
    [J]. Automation and Remote Control, 2011, 72 : 754 - 765
  • [32] Consensus Function Based on Clusters Clustering and Iterative Fusion of Base Clusters
    Mojarad, Musa
    Parvin, Hamid
    Nejatian, Samad
    Rezaie, Vahideh
    [J]. INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2019, 27 (01) : 97 - 120
  • [33] Estimating the Number of Clusters via System Evolution for Cluster Analysis of Gene Expression Data
    Wang, Kaijun
    Zheng, Jie
    Zhang, Junying
    Dong, Jiyang
    [J]. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2009, 13 (05): : 848 - 853
  • [34] Modified fuzzy gap statistic for estimating preferable number of clusters in fuzzy k-means clustering
    Arima, Chinatsu
    Hakamada, Kazumi
    Okamoto, Masahiro
    Hanai, Taizo
    [J]. JOURNAL OF BIOSCIENCE AND BIOENGINEERING, 2008, 105 (03) : 273 - 281
  • [35] Consensus clustering for detection of overlapping clusters in microarray data
    Deodhar, Meghana
    Ghosh, Joydeep
    [J]. ICDM 2006: SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, WORKSHOPS, 2006, : 104 - +
  • [36] Estimating the number of clusters in DNA microarray data
    Bolshakova, N
    Azuaje, F
    [J]. METHODS OF INFORMATION IN MEDICINE, 2006, 45 (02) : 153 - 157
  • [37] Fuzzy Clustering Ensemble with Selection of Number of Clusters
    Li, Taoying
    Chen, Yan
    [J]. JOURNAL OF COMPUTERS, 2010, 5 (07) : 1112 - 1119
  • [38] RSQRT: An heuristic for estimating the number of clusters to report
    Carlis, John
    Bruso, Kelsey
    [J]. ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2012, 11 (02) : 152 - 158
  • [39] Automatic identification of the number of clusters in hierarchical clustering
    Karna, Ashutosh
    Gibert, Karina
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (01): : 119 - 134
  • [40] An Approach to Determine the Number of Clusters for Clustering Algorithms
    Dinh Thuan Nguyen
    Huan Doan
    [J]. COMPUTATIONAL COLLECTIVE INTELLIGENCE - TECHNOLOGIES AND APPLICATIONS, PT I, 2012, 7653 : 485 - 494