Estimating the number of clusters via a corrected clustering instability

被引:0
|
作者
Jonas M. B. Haslbeck
Dirk U. Wulff
机构
[1] University of Amsterdam,Psychological Methods Group
[2] University of Basel,Center for Cognitive and Decision Science
[3] Max Planck Institute for Human Development,Center for Adaptive Rationality
来源
Computational Statistics | 2020年 / 35卷
关键词
Cluster analysis; k-means; Stability; Resampling;
D O I
暂无
中图分类号
学科分类号
摘要
We improve instability-based methods for the selection of the number of clusters k in cluster analysis by developing a corrected clustering distance that corrects for the unwanted influence of the distribution of cluster sizes on cluster instability. We show that our corrected instability measure outperforms current instability-based measures across the whole sequence of possible k, overcoming limitations of current insability-based methods for large k. We also compare, for the first time, model-based and model-free approaches to determining cluster-instability and find their performance to be comparable. We make our method available in the R-package cstab.
引用
收藏
页码:1879 / 1894
页数:15
相关论文
共 50 条
  • [1] Estimating the number of clusters via a corrected clustering instability
    Haslbeck, Jonas M. B.
    Wulff, Dirk U.
    [J]. COMPUTATIONAL STATISTICS, 2020, 35 (04) : 1879 - 1894
  • [2] Estimating the number of clusters in a dataset via consensus clustering
    Unlu, Ramazan
    Xanthopoulos, Petros
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 125 : 33 - 39
  • [3] Estimating the Number of Clusters Based on Sequential Clustering Algorithms
    Real, Eduardo Machado
    [J]. PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 229 - 234
  • [4] A Procedure for Estimating the Number of Clusters in Logistic Regression Clustering
    Guoqi Qian
    Yuehua Wu
    Qing Shao
    [J]. Journal of Classification, 2009, 26 : 183 - 199
  • [5] A Procedure for Estimating the Number of Clusters in Logistic Regression Clustering
    Qian, Guoqi
    Wu, Yuehua
    Shao, Qing
    [J]. JOURNAL OF CLASSIFICATION, 2009, 26 (02) : 183 - 199
  • [6] Estimating the Number of Clusters via the GUD Statistic
    Kou, Jiyao
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2014, 23 (02) : 403 - 417
  • [7] Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient
    Dinh, Duy-Tai
    Fujinami, Tsutomu
    Huynh, Van-Nam
    [J]. KNOWLEDGE AND SYSTEMS SCIENCES, KSS 2019, 2019, 1103 : 1 - 17
  • [8] Sequential clustering with particle filters - Estimating the number of clusters from data
    Schubert, J
    Sidenbladh, H
    [J]. 2005 7TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), VOLS 1 AND 2, 2005, : 122 - 129
  • [9] Estimating the number of clusters
    Cuevas, A
    Febrero, M
    Fraiman, R
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2000, 28 (02): : 367 - 382
  • [10] Estimating the number of clusters in a data set via the gap statistic
    Tibshirani, R
    Walther, G
    Hastie, T
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2001, 63 : 411 - 423