An Analysis of Distributed Document Clustering Using MapReduce Based K-Means Algorithm

被引:0
|
作者
Sardar T.H. [1 ]
Ansari Z. [2 ]
机构
[1] School of Computer Science and Engineering, Jain University, Bengaluru
[2] P.A. College of Engineering, Mangaluru
来源
Ansari, Zahid (zahid_cs@pace.edu.in) | 1600年 / Springer卷 / 101期
关键词
Distributed computing; Document clustering; Hadoop; MapReduce; Parallel K-means;
D O I
10.1007/s40031-020-00485-2
中图分类号
学科分类号
摘要
Clustering is considered as one of the important data mining techniques. Document clustering is among many applications of clustering. The traditional clustering algorithms are proven inefficient for clustering rapidly generating large real world datasets. As a solution, traditional clustering algorithms are modified using distributed programming paradigm. MapReduce is a popular distributed programming paradigm designed for Hadoop distributed framework. This paper demonstrates a MapReduce based modification of K-Means clustering algorithm for document datasets. The result shows that the proposed algorithm is efficient than traditional K-Means for all size of document datasets clustering. The experiments also show that the MapReduce clustering works more efficiently when the dataset size and Hadoop cluster sizes are large. © 2020, The Institution of Engineers (India).
引用
收藏
页码:641 / 650
页数:9
相关论文
共 50 条
  • [31] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
    Li, Yongyi
    Yang, Zhongqiang
    Han, Kaixu
    Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
  • [32] Distributed Algorithm for Text Documents Clustering Based on k-Means Approach
    Sarnovsky, Martin
    Carnoka, Noema
    INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, ISAT 2015, PT II, 2016, 430 : 165 - 174
  • [33] An Improved K-means Algorithm based on Mapreduce and Grid
    Ma, Li
    Gu, Lei
    Li, Bo
    Ma, Yue
    Wang, Jin
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2015, 8 (01): : 189 - 199
  • [34] Optimized big data K-means clustering using MapReduce
    Cui, Xiaoli
    Zhu, Pingfei
    Yang, Xin
    Li, Keqiu
    Ji, Changqing
    JOURNAL OF SUPERCOMPUTING, 2014, 70 (03): : 1249 - 1259
  • [35] Optimized big data K-means clustering using MapReduce
    Xiaoli Cui
    Pingfei Zhu
    Xin Yang
    Keqiu Li
    Changqing Ji
    The Journal of Supercomputing, 2014, 70 : 1249 - 1259
  • [36] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
    Shi Na
    Liu Xumin
    Guan Yong
    2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
  • [37] Distributed Clustering Based on K-means and CPGA
    Zhou, Jun
    Liu, Zhijing
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 444 - 447
  • [38] K-means algorithm based on particle swarm optimization for web document clustering
    Xiao, L. Z.
    Shao, Z. Q.
    Gu, X. M.
    DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2006, 13E : 980 - 984
  • [39] An Improved Hierarchical K-Means Algorithm for Web Document Clustering
    Liu, Yongxin
    Liu, Zhijng
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 606 - 610
  • [40] Text Document Clustering Based on Density K-means
    Wu, Di
    Zeng, Yan
    Qu, Yin-chuan
    INTERNATIONAL CONFERENCE ON COMPUTER, MECHATRONICS AND ELECTRONIC ENGINEERING (CMEE 2016), 2016,