Estimating the number of clusters from distributional results of partitioning a given data set

被引：3

作者：

Möller, U ^{[1
]}

机构：

[1] Hans Knoll Inst Nat Prod Res Jena, Bioinformat Pattern Recognit Grp, Jena, Germany

来源：

Adaptive and Natural Computing Algorithms | 2005年

关键词：

D O I：

10.1007/3-211-27389-1_36

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

When estimating the optimal value of the number of clusters, C, of a given data set, one typically uses, for each candidate value of C, a single (final) result of the clustering algorithm. If distributional data of size T are used, these data come from T data sets obtained. e.g., by a bootstrapping technique. Here a new approach is introduced that utilizes distributional data generated by clustering the original data T times in the framework of cost function optimization and cluster validity indices. Results of this method are reported for model data (100 realizations) and gene expression data. The probability of correctly estimating the number of clusters was often higher compared to recently published results of several classical methods and a new statistical approach (Clest).

引用

页码：151 / 154

页数：4

共 50 条

[41] The determination of the best value of the coupling-ratio from a given set of data
Engledow, FL
Yule, GU
[J]. PROCEEDINGS OF THE CAMBRIDGE PHILOSOPHICAL SOCIETY, 1914, 17 : 436 - 440
[42] USE OF SELF-ORGANIZATION TO PARTITION A SET OF DATA INTO CLUSTERS WHOSE NUMBER IS NOT SPECIFIED IN ADVANCE.
Ivakhnenko, A.G.
Koppa, Yu.V.
Petukhova, S.A.
Ivakhnenko, M.A.
[J]. Soviet Journal of Automation and Information Sciences (English translation of Avtomatyka), 1985, 18 (05): : 7 - 14
[43] Estimating the basic reproduction number from surveillance data on past epidemics
Froda, Sorana
Leduc, Hugues
[J]. MATHEMATICAL BIOSCIENCES, 2014, 256 : 89 - 101
[44] Estimating the Number of Induced Subgraphs from Incomplete Data and Neighborhood Queries
Fotakis, Dimitris
Pittas, Thanasis
Skoulakis, Stratis
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 4045 - 4053
[45] Estimating the number of protein folds and families from complete genome data
Wolf, YI
Grishin, NV
Koonin, EV
[J]. JOURNAL OF MOLECULAR BIOLOGY, 2000, 299 (04) : 897 - 905
[46] Estimating the size of neural networks from the number of available training data
Lappas, Georgios
[J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2007, PT 1, PROCEEDINGS, 2007, 4668 : 68 - 77
[47] Evaluating convective heat transfer coefficients from a given set of data by using Mathematica
Mikhailov, MD
[J]. COMMUNICATIONS IN NUMERICAL METHODS IN ENGINEERING, 2003, 19 (06): : 441 - 443
[48] Estimating the reproduction number and transmission heterogeneity from the size distribution of clusters of identical pathogen sequences
Tran -Kiem, Cecile
Bedford, Trevor
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (15) : e2305299121
[49] High and uncertain inflation: Results from a new data set
Davis, G
Kanago, B
[J]. JOURNAL OF MONEY CREDIT AND BANKING, 1998, 30 (02) : 218 - 230
[50] Estimating photovoltaic energy potential from a minimal set of randomly sampled data
Bocca, Alberto
Bottaccioli, Lorenzo
Chiavazzo, Eliodoro
Fasano, Matteo
Macii, Alberto
Asinari, Pietro
[J]. RENEWABLE ENERGY, 2016, 97 : 457 - 467

← 1 2 3 4 5 →