Estimating the number of clusters via a corrected clustering instability

被引:0
|
作者
Jonas M. B. Haslbeck
Dirk U. Wulff
机构
[1] University of Amsterdam,Psychological Methods Group
[2] University of Basel,Center for Cognitive and Decision Science
[3] Max Planck Institute for Human Development,Center for Adaptive Rationality
来源
Computational Statistics | 2020年 / 35卷
关键词
Cluster analysis; k-means; Stability; Resampling;
D O I
暂无
中图分类号
学科分类号
摘要
We improve instability-based methods for the selection of the number of clusters k in cluster analysis by developing a corrected clustering distance that corrects for the unwanted influence of the distribution of cluster sizes on cluster instability. We show that our corrected instability measure outperforms current instability-based measures across the whole sequence of possible k, overcoming limitations of current insability-based methods for large k. We also compare, for the first time, model-based and model-free approaches to determining cluster-instability and find their performance to be comparable. We make our method available in the R-package cstab.
引用
收藏
页码:1879 / 1894
页数:15
相关论文
共 50 条
  • [31] RSQRT: An heuristic for estimating the number of clusters to report
    Carlis, John
    Bruso, Kelsey
    [J]. ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2012, 11 (02) : 152 - 158
  • [32] Video Face Clustering with Unknown Number of Clusters
    Tapaswi, Makarand
    Law, Marc T.
    Fidler, Sanja
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5026 - 5035
  • [33] Estimating the number of clusters in a ranking data context
    Calmon, Wilson
    Albi, Mariana
    [J]. INFORMATION SCIENCES, 2021, 546 : 977 - 995
  • [34] Adaptive optimization of the number of clusters in fuzzy clustering
    Beringer, Juergen
    Huellermeier, Eyke
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-4, 2007, : 657 - +
  • [35] Estimating the number of clusters using a windowing technique
    Boutsinas B.
    Tasoulis D.K.
    Vrahatis M.N.
    [J]. Pattern Recognition and Image Analysis, 2006, 16 (2) : 143 - 154
  • [36] ESTIMATING INTRINSIC DIMENSION VIA CLUSTERING
    Eriksson, Brian
    Crovella, Mark
    [J]. 2012 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2012, : 760 - 763
  • [37] Estimating the number of species via a martingale estimating function
    Chao, A
    Yip, P
    Lin, HS
    [J]. STATISTICA SINICA, 1996, 6 (02) : 403 - 418
  • [38] Does Number of Clusters Effect the Purity and Entropy of Clustering?
    Uddin, Jamal
    Ghazali, Rozaida
    Deris, Mustafa Mat
    [J]. RECENT ADVANCES ON SOFT COMPUTING AND DATA MINING, 2017, 549 : 355 - 365
  • [39] UNSUPERVISED CLUSTERING ON SIGNED GRAPHS WITH UNKNOWN NUMBER OF CLUSTERS
    Dittrich, Thomas
    Matz, Gerald
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 1060 - 1064
  • [40] The upper bound of the optimal number of clusters in fuzzy clustering
    于剑
    程乾生
    [J]. Science China(Information Sciences), 2001, (02) : 119 - 125