Scalable fuzzy clustering algorithms

被引：0

作者：

Hall, Lawrence O. ^{[1
]}

机构：

[1] Univ S Florida, Dept Comp Sci & Engn, ENB 118, Tampa, FL 33620 USA

来源：

2008 ANNUAL MEETING OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY, VOLS 1 AND 2 | 2008年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Clustering is the most typical way to group unlabeled data. Today, there are very large unlabeled data sets available. Many of these data sets are too large to fit in the memory of a typical computer. Some of these data sets are so large that they can only be treated as data streams because not all of the data can be stored in a cost-effective manner. Fuzzy clustering algorithms are known to be very useful on small to medium-size data sets. This talk focuses on how to make some well understood classic fuzzy clustering algorithms scale to very large data sets and streaming data sets. The goal is to be able to create a data partition that reflects the whole data set, but requires practical computation times. In particular, we show that the fuzzy c-means families of algorithms can be scaled to provide data partitions that are very close and potentially identical to what you would get if you were able to cluster all the data. The general idea is to cluster subsets of the data and create weighted examples from the subsets. The weighted examples from a previous partition(s) are used with new data to create a new partition which reflects the examples currently loaded in memory and those partitioned previously. This process can be repeated until all the data has been clustered. Several variations on the theme of summarizing previous partitions with a set of weighted examples are given. Some history can be ignored, for example, in time changing data streams. One could also choose to cluster summarizations. Experimental data sets include several which contain tens of millions of examples, as well as streaming data sets. Results from real-world data sets show excellent partitions are obtained. For tractable size data sets it is shown that the partitions are comparable to those from fuzzy c-means when it clusters all the data.

引用

页码：852 / 853

页数：2

共 50 条

[31] Fuzzy Co-Clustering Algorithms Based on Fuzzy Relational Clustering and TIBA Imputation
Kanzawa, Yuchi
[J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2014, 18 (02) : 182 - 189
[32] Fuzzy partition entropies and entropy constrained fuzzy clustering algorithms
Karayiannis, NB
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 1997, 5 (02) : 103 - 111
[33] Fuzzy clustering algorithms in subjective classification tasks
Chacon M., Mario I.
Ramirez, Graciela
[J]. 2006 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2006, : 2309 - +
[34] Pattern recognition comparisons with fuzzy clustering algorithms
Li, ZY
Looney, CG
[J]. COMPUTERS AND THEIR APPLICATIONS, 2001, : 217 - 220
[35] NEW ALGORITHMS FOR SOLVING THE FUZZY CLUSTERING PROBLEM
KAMEL, MS
SELIM, SZ
[J]. PATTERN RECOGNITION, 1994, 27 (03) : 421 - 428
[36] Outlier resistant recursive fuzzy clustering algorithms
Bodyanskiy, Yevgeniy
Kokshenev, Illya
Gorshkov, Yevgen
Kolodyazhniy, Vitaliy
[J]. COMPUTATIONAL INTELLIGENCE, THEORY AND APPLICATION, 2006, : 647 - +
[37] A distributed approach to fuzzy clustering by genetic algorithms
Wei, CH
Fahn, CS
[J]. SOFT COMPUTING IN INTELLIGENT SYSTEMS AND INFORMATION PROCESSING, 1996, : 350 - 357
[38] CONVERGENCE THEOREM FOR THE FUZZY ISODATA CLUSTERING ALGORITHMS
BEZDEK, JC
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1980, 2 (01) : 1 - 8
[39] On Some Fuzzy Clustering Algorithms with Dimensionality Reduction
Kawamura, Masanori
Kanzawa, Yuchi
[J]. 2022 JOINT 12TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS AND 23RD INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (SCIS&ISIS), 2022,
[40] Fuzzy clustering algorithms for mixed feature variables
Yang, MS
Hwang, PY
Chen, DH
[J]. FUZZY SETS AND SYSTEMS, 2004, 141 (02) : 301 - 317

← 1 2 3 4 5 →