Estimating the number of clusters from distributional results of partitioning a given data set

被引:3
|
作者
Möller, U [1 ]
机构
[1] Hans Knoll Inst Nat Prod Res Jena, Bioinformat Pattern Recognit Grp, Jena, Germany
关键词
D O I
10.1007/3-211-27389-1_36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When estimating the optimal value of the number of clusters, C, of a given data set, one typically uses, for each candidate value of C, a single (final) result of the clustering algorithm. If distributional data of size T are used, these data come from T data sets obtained. e.g., by a bootstrapping technique. Here a new approach is introduced that utilizes distributional data generated by clustering the original data T times in the framework of cost function optimization and cluster validity indices. Results of this method are reported for model data (100 realizations) and gene expression data. The probability of correctly estimating the number of clusters was often higher compared to recently published results of several classical methods and a new statistical approach (Clest).
引用
收藏
页码:151 / 154
页数:4
相关论文
共 50 条
  • [41] The determination of the best value of the coupling-ratio from a given set of data
    Engledow, FL
    Yule, GU
    [J]. PROCEEDINGS OF THE CAMBRIDGE PHILOSOPHICAL SOCIETY, 1914, 17 : 436 - 440
  • [42] USE OF SELF-ORGANIZATION TO PARTITION A SET OF DATA INTO CLUSTERS WHOSE NUMBER IS NOT SPECIFIED IN ADVANCE.
    Ivakhnenko, A.G.
    Koppa, Yu.V.
    Petukhova, S.A.
    Ivakhnenko, M.A.
    [J]. Soviet Journal of Automation and Information Sciences (English translation of Avtomatyka), 1985, 18 (05): : 7 - 14
  • [43] Estimating the basic reproduction number from surveillance data on past epidemics
    Froda, Sorana
    Leduc, Hugues
    [J]. MATHEMATICAL BIOSCIENCES, 2014, 256 : 89 - 101
  • [44] Estimating the Number of Induced Subgraphs from Incomplete Data and Neighborhood Queries
    Fotakis, Dimitris
    Pittas, Thanasis
    Skoulakis, Stratis
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 4045 - 4053
  • [45] Estimating the number of protein folds and families from complete genome data
    Wolf, YI
    Grishin, NV
    Koonin, EV
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2000, 299 (04) : 897 - 905
  • [46] Estimating the size of neural networks from the number of available training data
    Lappas, Georgios
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2007, PT 1, PROCEEDINGS, 2007, 4668 : 68 - 77
  • [47] Evaluating convective heat transfer coefficients from a given set of data by using Mathematica
    Mikhailov, MD
    [J]. COMMUNICATIONS IN NUMERICAL METHODS IN ENGINEERING, 2003, 19 (06): : 441 - 443
  • [48] Estimating the reproduction number and transmission heterogeneity from the size distribution of clusters of identical pathogen sequences
    Tran -Kiem, Cecile
    Bedford, Trevor
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (15) : e2305299121
  • [49] High and uncertain inflation: Results from a new data set
    Davis, G
    Kanago, B
    [J]. JOURNAL OF MONEY CREDIT AND BANKING, 1998, 30 (02) : 218 - 230
  • [50] Estimating photovoltaic energy potential from a minimal set of randomly sampled data
    Bocca, Alberto
    Bottaccioli, Lorenzo
    Chiavazzo, Eliodoro
    Fasano, Matteo
    Macii, Alberto
    Asinari, Pietro
    [J]. RENEWABLE ENERGY, 2016, 97 : 457 - 467