Scalable fuzzy clustering algorithms

被引:0
|
作者
Hall, Lawrence O. [1 ]
机构
[1] Univ S Florida, Dept Comp Sci & Engn, ENB 118, Tampa, FL 33620 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is the most typical way to group unlabeled data. Today, there are very large unlabeled data sets available. Many of these data sets are too large to fit in the memory of a typical computer. Some of these data sets are so large that they can only be treated as data streams because not all of the data can be stored in a cost-effective manner. Fuzzy clustering algorithms are known to be very useful on small to medium-size data sets. This talk focuses on how to make some well understood classic fuzzy clustering algorithms scale to very large data sets and streaming data sets. The goal is to be able to create a data partition that reflects the whole data set, but requires practical computation times. In particular, we show that the fuzzy c-means families of algorithms can be scaled to provide data partitions that are very close and potentially identical to what you would get if you were able to cluster all the data. The general idea is to cluster subsets of the data and create weighted examples from the subsets. The weighted examples from a previous partition(s) are used with new data to create a new partition which reflects the examples currently loaded in memory and those partitioned previously. This process can be repeated until all the data has been clustered. Several variations on the theme of summarizing previous partitions with a set of weighted examples are given. Some history can be ignored, for example, in time changing data streams. One could also choose to cluster summarizations. Experimental data sets include several which contain tens of millions of examples, as well as streaming data sets. Results from real-world data sets show excellent partitions are obtained. For tractable size data sets it is shown that the partitions are comparable to those from fuzzy c-means when it clusters all the data.
引用
收藏
页码:852 / 853
页数:2
相关论文
共 50 条
  • [32] Fuzzy partition entropies and entropy constrained fuzzy clustering algorithms
    Karayiannis, NB
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 1997, 5 (02) : 103 - 111
  • [33] Fuzzy clustering algorithms in subjective classification tasks
    Chacon M., Mario I.
    Ramirez, Graciela
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2006, : 2309 - +
  • [34] Pattern recognition comparisons with fuzzy clustering algorithms
    Li, ZY
    Looney, CG
    [J]. COMPUTERS AND THEIR APPLICATIONS, 2001, : 217 - 220
  • [35] NEW ALGORITHMS FOR SOLVING THE FUZZY CLUSTERING PROBLEM
    KAMEL, MS
    SELIM, SZ
    [J]. PATTERN RECOGNITION, 1994, 27 (03) : 421 - 428
  • [36] Outlier resistant recursive fuzzy clustering algorithms
    Bodyanskiy, Yevgeniy
    Kokshenev, Illya
    Gorshkov, Yevgen
    Kolodyazhniy, Vitaliy
    [J]. COMPUTATIONAL INTELLIGENCE, THEORY AND APPLICATION, 2006, : 647 - +
  • [37] A distributed approach to fuzzy clustering by genetic algorithms
    Wei, CH
    Fahn, CS
    [J]. SOFT COMPUTING IN INTELLIGENT SYSTEMS AND INFORMATION PROCESSING, 1996, : 350 - 357
  • [39] On Some Fuzzy Clustering Algorithms with Dimensionality Reduction
    Kawamura, Masanori
    Kanzawa, Yuchi
    [J]. 2022 JOINT 12TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS AND 23RD INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (SCIS&ISIS), 2022,
  • [40] Fuzzy clustering algorithms for mixed feature variables
    Yang, MS
    Hwang, PY
    Chen, DH
    [J]. FUZZY SETS AND SYSTEMS, 2004, 141 (02) : 301 - 317