An Analysis of Distributed Document Clustering Using MapReduce Based K-Means Algorithm

被引:0
|
作者
Sardar T.H. [1 ]
Ansari Z. [2 ]
机构
[1] School of Computer Science and Engineering, Jain University, Bengaluru
[2] P.A. College of Engineering, Mangaluru
来源
Ansari, Zahid (zahid_cs@pace.edu.in) | 1600年 / Springer卷 / 101期
关键词
Distributed computing; Document clustering; Hadoop; MapReduce; Parallel K-means;
D O I
10.1007/s40031-020-00485-2
中图分类号
学科分类号
摘要
Clustering is considered as one of the important data mining techniques. Document clustering is among many applications of clustering. The traditional clustering algorithms are proven inefficient for clustering rapidly generating large real world datasets. As a solution, traditional clustering algorithms are modified using distributed programming paradigm. MapReduce is a popular distributed programming paradigm designed for Hadoop distributed framework. This paper demonstrates a MapReduce based modification of K-Means clustering algorithm for document datasets. The result shows that the proposed algorithm is efficient than traditional K-Means for all size of document datasets clustering. The experiments also show that the MapReduce clustering works more efficiently when the dataset size and Hadoop cluster sizes are large. © 2020, The Institution of Engineers (India).
引用
收藏
页码:641 / 650
页数:9
相关论文
共 50 条
  • [1] A hybrid MapReduce-based k-means clustering using genetic algorithm for distributed datasets
    Ankita Sinha
    Prasanta K. Jana
    The Journal of Supercomputing, 2018, 74 : 1562 - 1579
  • [2] A hybrid MapReduce-based k-means clustering using genetic algorithm for distributed datasets
    Sinha, Ankita
    Jana, Prasanta K.
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (04): : 1562 - 1579
  • [3] K-means Clustering Optimization Algorithm Based on MapReduce
    Li, Zhihua
    Song, Xudong
    Zhu, Wenhui
    Chen, Yanxia
    PROCEEDINGS OF THE 2015 INTERNATIONAL SYMPOSIUM ON COMPUTERS & INFORMATICS, 2015, 13 : 198 - 203
  • [4] A MapReduce-based K-means clustering algorithm
    YiMin Mao
    DeJin Gan
    D. S. Mwakapesa
    Y. A. Nanehkaran
    Tao Tao
    XueYu Huang
    The Journal of Supercomputing, 2022, 78 : 5181 - 5202
  • [5] A MapReduce-based K-means clustering algorithm
    Mao, YiMin
    Gan, DeJin
    Mwakapesa, D. S.
    Nanehkaran, Y. A.
    Tao, Tao
    Huang, XueYu
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (04): : 5181 - 5202
  • [6] MapReduce Design of K-Means Clustering Algorithm
    Anchalia, Prajesh P.
    Koundinya, Anjan K.
    Srinath, N. K.
    2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA 2013), 2013,
  • [7] An Efficient K-means Clustering Algorithm on MapReduce
    Li, Qiuhong
    Wang, Peng
    Wang, Wei
    Hu, Hao
    Li, Zhongsheng
    Li, Junxian
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT I, 2014, 8421 : 357 - 371
  • [8] Pillar K-Means Clustering Algorithm Using MapReduce Framework
    Ramdani, A. L.
    Firmansyah, H. B.
    INTERNATIONAL CONFERENCE ON SCIENCE, INFRASTRUCTURE TECHNOLOGY AND REGIONAL DEVELOPMENT, 2019, 258
  • [9] MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce
    Akthar, Nadeem
    Ahamad, Mohd Vasim
    Ahmad, Shahbaaz
    2016 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2016, : 192 - 198
  • [10] An Improved Sampling K-means Clustering Algorithm Based on MapReduce
    Zhang Ya-ling
    Wang Ya-nan
    2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017,