A study of large-scale data clustering based on fuzzy clustering

被引:0
|
作者
Yangyang Li
Guoli Yang
Haiyang He
Licheng Jiao
Ronghua Shang
机构
[1] Xidian University,Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education of China, International Research Center for Intelligent Perception and Computation
来源
Soft Computing | 2016年 / 20卷
关键词
Fuzzy c-means algorithms; Data stream clustering; Handwritten digits images; Large-scale data;
D O I
暂无
中图分类号
学科分类号
摘要
Large-scale data are any data that cannot be loaded into the main memory of the ordinary. This is not the objective definition of large-scale data, but it is easy to understand what the large-scale data is. We first introduce some present algorithms to clustering large-scale data, some data stream clustering algorithms based on FCM algorithms are also introduced. In this paper, we propose a new structure to cluster large-scale data and two new data stream clustering algorithms based on the structure are propose in Sects. 3 and 4. In our method, we load the objects in the dataset one by one. We set a threshold of the membership, if the membership of one object and a cluster center is bigger than the threshold, the object is assigned to the cluster and the location of nearest cluster center will be updated, else the object is put into the temporary matrix; we call it pool. When the pool is full, we cluster the data in the pool and update the location of cluster centers. The two algorithms are based on the data stream structure. The difference of the two algorithms is the how the objects in the data are weighed. We test our algorithms on handwritten digits images dataset and several large-scale UCI datasets and make a comparison with some presented algorithms. The experiments proved that our algorithm is more suitable to cluster large-scale datasets.
引用
收藏
页码:3231 / 3242
页数:11
相关论文
共 50 条
  • [1] A study of large-scale data clustering based on fuzzy clustering
    Li, Yangyang
    Yang, Guoli
    He, Haiyang
    Jiao, Licheng
    Shang, Ronghua
    [J]. SOFT COMPUTING, 2016, 20 (08) : 3231 - 3242
  • [2] Fuzzy clustering algorithm based on multiple medoids for large-scale data
    Chen, Ai-Guo
    Wang, Shi-Tong
    [J]. Kongzhi yu Juece/Control and Decision, 2016, 31 (12): : 2122 - 2130
  • [3] Large-scale parallel data clustering
    Judd, D
    McKinley, PK
    Jain, AK
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (08) : 871 - 876
  • [4] On the Clustering of Large-scale Data: A Matrix-based Approach
    Wang, Lijun
    Dong, Ming
    [J]. 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 139 - 144
  • [5] Large-Scale Time Series Clustering Based on Fuzzy Granulation and Collaboration
    Wang, Xiao
    Yu, Fusheng
    Zhang, Huixin
    Liu, Shihu
    Wang, Jiayin
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2015, 30 (06) : 763 - 780
  • [6] Granulation-based Fuzzy Clustering of Large-scale Time Series
    Wang, Xiao
    Yu, Fusheng
    Zhang, Huixin
    [J]. 2013 10TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2013, : 466 - 471
  • [7] A stratified sampling based clustering algorithm for large-scale data
    Zhao, Xingwang
    Liang, Jiye
    Dang, Chuangyin
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 163 : 416 - 428
  • [8] A fast fuzzy clustering algorithm for large-scale datasets
    Shi, LK
    He, PL
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 203 - 208
  • [9] Speeding up the large-scale consensus fuzzy clustering for handling Big Data
    Sassi Hidri, Minyar
    Zoghlami, Mohamed Ali
    Ben Ayed, Rahma
    [J]. FUZZY SETS AND SYSTEMS, 2018, 348 : 50 - 74
  • [10] Fuzzy Clustering of Large-Scale Data Sets Using Principal Component Analysis
    Arfaoui, Olfa
    Sassi Hidri, Minyar
    [J]. IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 683 - 690