MapReduce-based fuzzy c-means clustering algorithm: implementation and scalability

被引:54
|
作者
Ludwig, Simone A. [1 ]
机构
[1] N Dakota State Univ, Dept Comp Sci, Fargo, ND 58105 USA
关键词
MapReduce; Hadoop; Scalability;
D O I
10.1007/s13042-015-0367-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The management and analysis of big data has been identified as one of the most important emerging needs in recent years. This is because of the sheer volume and increasing complexity of data being created or collected. Current clustering algorithms can not handle big data, and therefore, scalable solutions are necessary. Since fuzzy clustering algorithms have shown to outperform hard clustering approaches in terms of accuracy, this paper investigates the parallelization and scalability of a common and effective fuzzy clustering algorithm named fuzzy c-means (FCM) algorithm. The algorithm is parallelized using the MapReduce paradigm outlining how the Map and Reduce primitives are implemented. A validity analysis is conducted in order to show that the implementation works correctly achieving competitive purity results compared to state-of-the art clustering algorithms. Furthermore, a scalability analysis is conducted to demonstrate the performance of the parallel FCM implementation with increasing number of computing nodes used.
引用
收藏
页码:923 / 934
页数:12
相关论文
共 50 条
  • [1] MapReduce-based fuzzy c-means clustering algorithm: implementation and scalability
    Simone A. Ludwig
    [J]. International Journal of Machine Learning and Cybernetics, 2015, 6 : 923 - 934
  • [2] MapReduce-based Fuzzy C-means Algorithm for Distributed Document Clustering
    Sardar T.H.
    Ansari Z.
    [J]. Journal of The Institution of Engineers (India): Series B, 2022, 103 (01): : 131 - 142
  • [3] An Improved Fuzzy C-Means Algorithm Based on MapReduce
    Yu, Qing
    Ding, Zhimin
    [J]. 2015 8TH INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI), 2015, : 634 - 638
  • [4] MapReduce-based fast fuzzy c-means algorithm for large-scale underwater image segmentation
    Li, Xiu
    Song, Jingdong
    Zhang, Fan
    Ouyang, Xiaogang
    Khan, Samee U.
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 65 : 90 - 101
  • [5] A MapReduce-based K-means clustering algorithm
    Mao, YiMin
    Gan, DeJin
    Mwakapesa, D. S.
    Nanehkaran, Y. A.
    Tao, Tao
    Huang, XueYu
    [J]. JOURNAL OF SUPERCOMPUTING, 2022, 78 (04): : 5181 - 5202
  • [6] A MapReduce-based K-means clustering algorithm
    YiMin Mao
    DeJin Gan
    D. S. Mwakapesa
    Y. A. Nanehkaran
    Tao Tao
    XueYu Huang
    [J]. The Journal of Supercomputing, 2022, 78 : 5181 - 5202
  • [7] A new fuzzy relational clustering algorithm based on the fuzzy C-means algorithm
    Corsini, P
    Lazzerini, B
    Marcelloni, F
    [J]. SOFT COMPUTING, 2005, 9 (06) : 439 - 447
  • [8] A new fuzzy relational clustering algorithm based on the fuzzy C-means algorithm
    P. Corsini
    B. Lazzerini
    F. Marcelloni
    [J]. Soft Computing, 2005, 9 : 439 - 447
  • [9] Implementation of Fuzzy C-Means (FCM) Clustering Based Camouflage Image Generation Algorithm
    Xiao, Weijie
    Zhao, Yan
    Gao, Xiaohui
    Liao, Congwei
    Huang, Shengxiang
    Deng, Lianwen
    [J]. IEEE ACCESS, 2021, 9 : 120203 - 120209
  • [10] A Kernelized Fuzzy C-means Clustering Algorithm based on Bat Algorithm
    Cheng, Chunying
    Bao, Chunhua
    [J]. PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2018), 2018, : 1 - 5