Estimating the number of clusters in a dataset via consensus clustering

被引:79
|
作者
Unlu, Ramazan [1 ]
Xanthopoulos, Petros [2 ]
机构
[1] Gumushane Univ, Dept Management & Informat Syst, Gumushanevi Kampusu,Baglarbasi Mahallesi, TR-29100 Gumushane, Turkey
[2] Stetson Univ, Sch Business Adm, Decis & Informat Sci Dept, 421 N Woodland Blvd, Deland, FL 32723 USA
关键词
Weighted consensus clustering; Validity indices; Number of clusters; MICROARRAY DATA; SELECTION; VALIDATION;
D O I
10.1016/j.eswa.2019.01.074
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In unsupervised learning, the problem of finding the appropriate number of clusters-usually notated as k- is very challenging. Its importance lies in the fact that k is a vital hyperparameter for the most clustering algorithms. One algorithmic approach for tacking this problem is to apply a certain clustering algorithm with various cluster configurations and decide to use the one that maximizes a certain internal validity measure. This is a promising and computationally efficient approach since the independent runs are parallelizable. In this paper, we attempt to improve over this estimation approach by incorporating a consensus clustering approach into k estimating scheme. The weighted consensus clustering scheme employs four different indices namely Silhouette (SH), Calinski-Harabasz (CH), Davies-Bouldin (DB), and Consensus (CI) indices to estimate the correct number of cluster. Computational experiments in a dataset with clusters ranging from 2 to 7 show the profound advantages of weighted consensus clustering for correctly finding k in comparison to individual clustering method (e.g, k-means) and simple consensus clustering. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:33 / 39
页数:7
相关论文
共 50 条
  • [1] Estimating the number of clusters via a corrected clustering instability
    Haslbeck, Jonas M. B.
    Wulff, Dirk U.
    [J]. COMPUTATIONAL STATISTICS, 2020, 35 (04) : 1879 - 1894
  • [2] Estimating the number of clusters via a corrected clustering instability
    Jonas M. B. Haslbeck
    Dirk U. Wulff
    [J]. Computational Statistics, 2020, 35 : 1879 - 1894
  • [3] Estimating the predominant number of clusters in a dataset
    Al Shaqsi, Jamil
    Wang, Wenjia
    [J]. INTELLIGENT DATA ANALYSIS, 2013, 17 (04) : 603 - 626
  • [4] On the Persistence of Clustering Solutions and True Number of Clusters in a Dataset
    Srivastava, Amber
    Baranwal, Mayank
    Salapaka, Srinivasa
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5000 - 5007
  • [5] Estimating the Number of Clusters Based on Sequential Clustering Algorithms
    Real, Eduardo Machado
    [J]. PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 229 - 234
  • [6] A Procedure for Estimating the Number of Clusters in Logistic Regression Clustering
    Guoqi Qian
    Yuehua Wu
    Qing Shao
    [J]. Journal of Classification, 2009, 26 : 183 - 199
  • [7] A Procedure for Estimating the Number of Clusters in Logistic Regression Clustering
    Qian, Guoqi
    Wu, Yuehua
    Shao, Qing
    [J]. JOURNAL OF CLASSIFICATION, 2009, 26 (02) : 183 - 199
  • [8] A randomized PTAS for the minimum Consensus Clustering with a fixed number of clusters
    Bonizzoni, Paola
    Della Vedova, Gianluca
    Dondi, Riccardo
    [J]. THEORETICAL COMPUTER SCIENCE, 2012, 429 : 36 - 45
  • [9] Estimating the Number of Clusters via the GUD Statistic
    Kou, Jiyao
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2014, 23 (02) : 403 - 417
  • [10] AutoElbow: An Automatic Elbow Detection Method for Estimating the Number of Clusters in a Dataset
    Onumanyi, Adeiza James
    Molokomme, Daisy Nkele
    Isaac, Sherrin John
    Abu-Mahfouz, Adnan M.
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (15):