Estimating the number of clusters from distributional results of partitioning a given data set

被引:3
|
作者
Möller, U [1 ]
机构
[1] Hans Knoll Inst Nat Prod Res Jena, Bioinformat Pattern Recognit Grp, Jena, Germany
关键词
D O I
10.1007/3-211-27389-1_36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When estimating the optimal value of the number of clusters, C, of a given data set, one typically uses, for each candidate value of C, a single (final) result of the clustering algorithm. If distributional data of size T are used, these data come from T data sets obtained. e.g., by a bootstrapping technique. Here a new approach is introduced that utilizes distributional data generated by clustering the original data T times in the framework of cost function optimization and cluster validity indices. Results of this method are reported for model data (100 realizations) and gene expression data. The probability of correctly estimating the number of clusters was often higher compared to recently published results of several classical methods and a new statistical approach (Clest).
引用
收藏
页码:151 / 154
页数:4
相关论文
共 50 条
  • [31] A Method to Find Optimum Number of Clusters Based on Fuzzy Silhouette on Dynamic Data Set
    Subbalakshmi, Chatti
    Krishna, G. Rama
    Rao, S. Krishna Mohan
    Rao, P. Venketeswa
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES, ICICT 2014, 2015, 46 : 346 - 353
  • [32] Cycles with a given number of vertices from each partite set in regular multipartite tournaments
    Lutz Volkmann
    Stefan Winzen
    [J]. Czechoslovak Mathematical Journal, 2006, 56 : 827 - 844
  • [33] Paths with a given number of vertices from each partite set in regular multipartite tournaments
    Volkmann, Lutz
    Winzen, Stefan
    [J]. DISCRETE MATHEMATICS, 2006, 306 (21) : 2724 - 2732
  • [34] Cycles with a given number of vertices from each partite set in regular multipartite tournaments
    Volkmann, Lutz
    Winzen, Stefan
    [J]. CZECHOSLOVAK MATHEMATICAL JOURNAL, 2006, 56 (03) : 827 - 843
  • [35] A hierarchical Gamma Mixture Model-based method for estimating the number of clusters in complex data
    Azhar, Muhammad
    Huang, Joshua Zhexue
    Masud, Md Abdul
    Li, Mark Junjie
    Cui, Laizhong
    [J]. APPLIED SOFT COMPUTING, 2020, 87
  • [36] Estimating the number of drug injectors from needle exchange data
    Hay, G
    Smit, F
    [J]. ADDICTION RESEARCH & THEORY, 2003, 11 (04) : 235 - 243
  • [37] Estimating the basic reproduction number from noisy daily data
    Descary, Marie-Helene
    Froda, Sorana
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2022, 549
  • [38] Estimating the Basic Reproductive Number from Viral Sequence Data
    Stadler, Tanja
    Kouyos, Roger
    von Wyl, Viktor
    Yerly, Sabine
    Boeni, Juerg
    Buergisser, Philippe
    Klimkait, Thomas
    Joos, Beda
    Rieder, Philip
    Xie, Dong
    Guenthard, Huldrych F.
    Drummond, Alexei J.
    Bonhoeffer, Sebastian
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2012, 29 (01) : 347 - 357
  • [39] The determination of the best value of the coupling-ratio from a given set of data
    Engledow, FL
    Yule, GU
    [J]. PROCEEDINGS OF THE CAMBRIDGE PHILOSOPHICAL SOCIETY, 1914, 17 : 436 - 440
  • [40] Estimating the basic reproduction number from surveillance data on past epidemics
    Froda, Sorana
    Leduc, Hugues
    [J]. MATHEMATICAL BIOSCIENCES, 2014, 256 : 89 - 101