Estimating the number of clusters from distributional results of partitioning a given data set

被引:3
|
作者
Möller, U [1 ]
机构
[1] Hans Knoll Inst Nat Prod Res Jena, Bioinformat Pattern Recognit Grp, Jena, Germany
关键词
D O I
10.1007/3-211-27389-1_36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When estimating the optimal value of the number of clusters, C, of a given data set, one typically uses, for each candidate value of C, a single (final) result of the clustering algorithm. If distributional data of size T are used, these data come from T data sets obtained. e.g., by a bootstrapping technique. Here a new approach is introduced that utilizes distributional data generated by clustering the original data T times in the framework of cost function optimization and cluster validity indices. Results of this method are reported for model data (100 realizations) and gene expression data. The probability of correctly estimating the number of clusters was often higher compared to recently published results of several classical methods and a new statistical approach (Clest).
引用
收藏
页码:151 / 154
页数:4
相关论文
共 50 条
  • [1] Estimating the number of clusters in a data set via the gap statistic
    Tibshirani, R
    Walther, G
    Hastie, T
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2001, 63 : 411 - 423
  • [2] A hybrid method for estimating the predominant number of clusters in a data set
    Al Shaqsi, Jamil
    Wang, Wenjia
    [J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 569 - 573
  • [3] A Multicriteria Decision Making Approach for Estimating the Number of Clusters in a Data Set
    Peng, Yi
    Zhang, Yong
    Kou, Gang
    Shi, Yong
    [J]. PLOS ONE, 2012, 7 (07):
  • [4] Estimating the number of clusters in a numerical data set via quantization error modeling
    Kolesnikov, Alexander
    Trichina, Elena
    Kauranne, Tuomo
    [J]. PATTERN RECOGNITION, 2015, 48 (03) : 941 - 952
  • [5] Estimating the number of clusters in DNA microarray data
    Bolshakova, N
    Azuaje, F
    [J]. METHODS OF INFORMATION IN MEDICINE, 2006, 45 (02) : 153 - 157
  • [6] Estimating the number of clusters in a ranking data context
    Calmon, Wilson
    Albi, Mariana
    [J]. INFORMATION SCIENCES, 2021, 546 : 977 - 995
  • [7] An ensemble method for estimating the number of clusters in a big data set using multiple random samples
    Mahmud, Mohammad Sultan
    Huang, Joshua Zhexue
    Ruby, Rukhsana
    Wu, Kaishun
    [J]. JOURNAL OF BIG DATA, 2023, 10 (01)
  • [8] An ensemble method for estimating the number of clusters in a big data set using multiple random samples
    Mohammad Sultan Mahmud
    Joshua Zhexue Huang
    Rukhsana Ruby
    Kaishun Wu
    [J]. Journal of Big Data, 10
  • [9] Sequential clustering with particle filters - Estimating the number of clusters from data
    Schubert, J
    Sidenbladh, H
    [J]. 2005 7TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), VOLS 1 AND 2, 2005, : 122 - 129
  • [10] Effects of Resampling in Determining the Number of Clusters in a Data Set
    Rainer Dangl
    Friedrich Leisch
    [J]. Journal of Classification, 2020, 37 : 558 - 583