Scalable fuzzy clustering algorithms

被引：0

作者：

Hall, Lawrence O. ^{[1
]}

机构：

[1] Univ S Florida, Dept Comp Sci & Engn, ENB 118, Tampa, FL 33620 USA

来源：

2008 ANNUAL MEETING OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY, VOLS 1 AND 2 | 2008年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Clustering is the most typical way to group unlabeled data. Today, there are very large unlabeled data sets available. Many of these data sets are too large to fit in the memory of a typical computer. Some of these data sets are so large that they can only be treated as data streams because not all of the data can be stored in a cost-effective manner. Fuzzy clustering algorithms are known to be very useful on small to medium-size data sets. This talk focuses on how to make some well understood classic fuzzy clustering algorithms scale to very large data sets and streaming data sets. The goal is to be able to create a data partition that reflects the whole data set, but requires practical computation times. In particular, we show that the fuzzy c-means families of algorithms can be scaled to provide data partitions that are very close and potentially identical to what you would get if you were able to cluster all the data. The general idea is to cluster subsets of the data and create weighted examples from the subsets. The weighted examples from a previous partition(s) are used with new data to create a new partition which reflects the examples currently loaded in memory and those partitioned previously. This process can be repeated until all the data has been clustered. Several variations on the theme of summarizing previous partitions with a set of weighted examples are given. Some history can be ignored, for example, in time changing data streams. One could also choose to cluster summarizations. Experimental data sets include several which contain tens of millions of examples, as well as streaming data sets. Results from real-world data sets show excellent partitions are obtained. For tractable size data sets it is shown that the partitions are comparable to those from fuzzy c-means when it clusters all the data.

引用

页码：852 / 853

页数：2

共 50 条

[1] SCALABLE ALGORITHMS FOR CONVEX CLUSTERING
Zhou, Weilian
Yi, Haidong
Mishne, Gal
Chi, Eric
[J]. 2021 IEEE DATA SCIENCE AND LEARNING WORKSHOP (DSLW), 2021,
[2] Evaluating Scalable Fuzzy Clustering
Gu, Yuhua
Hall, Lawrence O.
Goldgof, Dmitry B.
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010,
[3] Genetic algorithms for clustering and fuzzy clustering
Bandyopadhyay, Sanghamitra
[J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2011, 1 (06) : 524 - 531
[4] Scalable Co-clustering Algorithms
Kwon, Bongjune
Cho, Hyuk
[J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PT 1, PROCEEDINGS, 2010, 6081 : 32 - +
[5] Scalable clustering algorithms with balancing constraints
Banerjee, Arindam
Ghosh, Joydeep
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2006, 13 (03) : 365 - 395
[6] Scalable Clustering Algorithms with Balancing Constraints
Arindam Banerjee
Joydeep Ghosh
[J]. Data Mining and Knowledge Discovery, 2006, 13 : 365 - 395
[7] Comparison of Scalable Fuzzy Clustering Methods
Parker, Jonathon K.
Hall, Lawrence O.
Bezdek, James C.
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2012,
[8] Scalable swarm based fuzzy clustering
Hall, LO
Kanade, PM
[J]. FROM DATA AND INFORMATION ANALYSIS TO KNOWLEDGE ENGINEERING, 2006, : 21 - +
[9] Scalable Fuzzy Clustering With Anchor Graph
Liu, Chaodie
Nie, Feiping
Wang, Rong
Li, Xuelong
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (08) : 8503 - 8514
[10] Fuzzy clustering with evolutionary algorithms
Klawonn, F
Keller, A
[J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 1998, 13 (10-11) : 975 - 991

← 1 2 3 4 5 →