Estimating the number of clusters from distributional results of partitioning a given data set

被引:3
|
作者
Möller, U [1 ]
机构
[1] Hans Knoll Inst Nat Prod Res Jena, Bioinformat Pattern Recognit Grp, Jena, Germany
关键词
D O I
10.1007/3-211-27389-1_36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When estimating the optimal value of the number of clusters, C, of a given data set, one typically uses, for each candidate value of C, a single (final) result of the clustering algorithm. If distributional data of size T are used, these data come from T data sets obtained. e.g., by a bootstrapping technique. Here a new approach is introduced that utilizes distributional data generated by clustering the original data T times in the framework of cost function optimization and cluster validity indices. Results of this method are reported for model data (100 realizations) and gene expression data. The probability of correctly estimating the number of clusters was often higher compared to recently published results of several classical methods and a new statistical approach (Clest).
引用
收藏
页码:151 / 154
页数:4
相关论文
共 50 条
  • [21] Adding genotypic differences in reproductive partitioning and grain set efficiency for estimating sorghum grain number
    Gambin, Brenda L.
    Borras, Lucas
    [J]. CROP & PASTURE SCIENCE, 2013, 64 (01): : 9 - 17
  • [22] A GRAPH-THEORETIC CRITERION FOR DETERMINING THE NUMBER OF CLUSTERS IN A DATA SET
    KROLAKSCHWERDT, S
    ECKES, T
    [J]. MULTIVARIATE BEHAVIORAL RESEARCH, 1992, 27 (04) : 541 - 565
  • [23] Nbclust: An R Package for Determining the Relevant Number of Clusters in a Data Set
    Charrad, Malika
    Ghazzali, Nadia
    Boiteau, Veronique
    Niknafs, Azam
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2014, 61 (06): : 1 - 36
  • [24] High dimensional model representation based partitioning of a function's data set with uncertainty in data given points
    Guvenc, D.
    Demiralp, M.
    [J]. RECENT PROGRESS IN COMPUTATIONAL SCIENCES AND ENGINEERING, VOLS 7A AND 7B, 2006, 7A-B : 180 - 183
  • [25] Evaluation of the number of clusters in a data set using p-values from multiple tests of hypotheses
    Modak, Soumita
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024, 53 (24) : 8878 - 8889
  • [26] A comparison of procedures for estimating the parent probability distribution from a given set of fractiles
    Lau, HS
    Lau, AHL
    Kottas, JF
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2000, 120 (03) : 657 - 670
  • [28] Estimating the number of clusters in microarray data sets based on an information theoretic criterion
    Nicorici, Daniel
    Astola, Jaakko
    Yli-Harja, Olli
    [J]. 2005 IEEE/SP 13TH WORKSHOP ON STATISTICAL SIGNAL PROCESSING (SSP), VOLS 1 AND 2, 2005, : 936 - 940
  • [29] Estimating the Number of Clusters via System Evolution for Cluster Analysis of Gene Expression Data
    Wang, Kaijun
    Zheng, Jie
    Zhang, Junying
    Dong, Jiyang
    [J]. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2009, 13 (05): : 848 - 853
  • [30] The possibility of estimating the values of a function at given points of the measurement results of a finite number of its linear functionals
    Chulichkov, A. I.
    Yuan, B.
    [J]. MOSCOW UNIVERSITY PHYSICS BULLETIN, 2014, 69 (03) : 218 - 222