Scalable fuzzy clustering algorithms

被引:0
|
作者
Hall, Lawrence O. [1 ]
机构
[1] Univ S Florida, Dept Comp Sci & Engn, ENB 118, Tampa, FL 33620 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is the most typical way to group unlabeled data. Today, there are very large unlabeled data sets available. Many of these data sets are too large to fit in the memory of a typical computer. Some of these data sets are so large that they can only be treated as data streams because not all of the data can be stored in a cost-effective manner. Fuzzy clustering algorithms are known to be very useful on small to medium-size data sets. This talk focuses on how to make some well understood classic fuzzy clustering algorithms scale to very large data sets and streaming data sets. The goal is to be able to create a data partition that reflects the whole data set, but requires practical computation times. In particular, we show that the fuzzy c-means families of algorithms can be scaled to provide data partitions that are very close and potentially identical to what you would get if you were able to cluster all the data. The general idea is to cluster subsets of the data and create weighted examples from the subsets. The weighted examples from a previous partition(s) are used with new data to create a new partition which reflects the examples currently loaded in memory and those partitioned previously. This process can be repeated until all the data has been clustered. Several variations on the theme of summarizing previous partitions with a set of weighted examples are given. Some history can be ignored, for example, in time changing data streams. One could also choose to cluster summarizations. Experimental data sets include several which contain tens of millions of examples, as well as streaming data sets. Results from real-world data sets show excellent partitions are obtained. For tractable size data sets it is shown that the partitions are comparable to those from fuzzy c-means when it clusters all the data.
引用
收藏
页码:852 / 853
页数:2
相关论文
共 50 条
  • [1] SCALABLE ALGORITHMS FOR CONVEX CLUSTERING
    Zhou, Weilian
    Yi, Haidong
    Mishne, Gal
    Chi, Eric
    [J]. 2021 IEEE DATA SCIENCE AND LEARNING WORKSHOP (DSLW), 2021,
  • [2] Evaluating Scalable Fuzzy Clustering
    Gu, Yuhua
    Hall, Lawrence O.
    Goldgof, Dmitry B.
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010,
  • [3] Genetic algorithms for clustering and fuzzy clustering
    Bandyopadhyay, Sanghamitra
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2011, 1 (06) : 524 - 531
  • [4] Scalable Co-clustering Algorithms
    Kwon, Bongjune
    Cho, Hyuk
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PT 1, PROCEEDINGS, 2010, 6081 : 32 - +
  • [5] Scalable clustering algorithms with balancing constraints
    Banerjee, Arindam
    Ghosh, Joydeep
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2006, 13 (03) : 365 - 395
  • [6] Scalable Clustering Algorithms with Balancing Constraints
    Arindam Banerjee
    Joydeep Ghosh
    [J]. Data Mining and Knowledge Discovery, 2006, 13 : 365 - 395
  • [7] Comparison of Scalable Fuzzy Clustering Methods
    Parker, Jonathon K.
    Hall, Lawrence O.
    Bezdek, James C.
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2012,
  • [8] Scalable swarm based fuzzy clustering
    Hall, LO
    Kanade, PM
    [J]. FROM DATA AND INFORMATION ANALYSIS TO KNOWLEDGE ENGINEERING, 2006, : 21 - +
  • [9] Scalable Fuzzy Clustering With Anchor Graph
    Liu, Chaodie
    Nie, Feiping
    Wang, Rong
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (08) : 8503 - 8514
  • [10] Fuzzy clustering with evolutionary algorithms
    Klawonn, F
    Keller, A
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 1998, 13 (10-11) : 975 - 991