An Analysis of Distributed Document Clustering Using MapReduce Based K-Means Algorithm

被引:0
|
作者
Sardar T.H. [1 ]
Ansari Z. [2 ]
机构
[1] School of Computer Science and Engineering, Jain University, Bengaluru
[2] P.A. College of Engineering, Mangaluru
来源
Ansari, Zahid (zahid_cs@pace.edu.in) | 1600年 / Springer卷 / 101期
关键词
Distributed computing; Document clustering; Hadoop; MapReduce; Parallel K-means;
D O I
10.1007/s40031-020-00485-2
中图分类号
学科分类号
摘要
Clustering is considered as one of the important data mining techniques. Document clustering is among many applications of clustering. The traditional clustering algorithms are proven inefficient for clustering rapidly generating large real world datasets. As a solution, traditional clustering algorithms are modified using distributed programming paradigm. MapReduce is a popular distributed programming paradigm designed for Hadoop distributed framework. This paper demonstrates a MapReduce based modification of K-Means clustering algorithm for document datasets. The result shows that the proposed algorithm is efficient than traditional K-Means for all size of document datasets clustering. The experiments also show that the MapReduce clustering works more efficiently when the dataset size and Hadoop cluster sizes are large. © 2020, The Institution of Engineers (India).
引用
收藏
页码:641 / 650
页数:9
相关论文
共 50 条
  • [21] NEW ALGORITHM FOR CLUSTERING DISTRIBUTED DATA USING K-MEANS
    Khedr, Ahmed M.
    Bhatnagar, Raj K.
    COMPUTING AND INFORMATICS, 2014, 33 (04) : 943 - 964
  • [22] Optimization of the Distributed K-means Clustering Algorithm Based on Set Pair Analysis
    Ling, Song
    Qi Yunfeng
    2015 8th International Congress on Image and Signal Processing (CISP), 2015, : 1593 - 1598
  • [23] An Optimal Distributed K-Means Clustering Algorithm Based on CloudStack
    Mao, Yingchi
    Xu, Ziyang
    Li, Xiaofang
    Ping, Ping
    2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 3149 - 3156
  • [24] An Optimal Distributed K-Means Clustering Algorithm Based on CloudStack
    Mao, Yingchi
    Xu, Ziyang
    Ping, Ping
    Wang, Longbao
    2015 NINTH INTERNATIONAL CONFERENCE ON FRONTIER OF COMPUTER SCIENCE AND TECHNOLOGY FCST 2015, 2015, : 386 - 391
  • [25] An Effective and Efficient Clustering Based on K-Means Using MapReduce and TLBO
    Pedireddla, Praveen Kumar
    Yadwad, Sunita A.
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 3, 2016, 381 : 619 - 628
  • [26] An Efficient Data Structure for Document Clustering Using K-Means Algorithm
    Killani, Ramanji
    Satapathy, Suresh Chandra
    Sowjanya, A. M.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 337 - +
  • [27] MapReduce-based Fuzzy C-means Algorithm for Distributed Document Clustering
    Sardar T.H.
    Ansari Z.
    Journal of The Institution of Engineers (India): Series B, 2022, 103 (01) : 131 - 142
  • [28] Document Clustering - A Feasible Demonstration with K-means Algorithm
    Arif, Wajiha
    Mahoto, Naeem Ahmed
    2019 2ND INTERNATIONAL CONFERENCE ON COMPUTING, MATHEMATICS AND ENGINEERING TECHNOLOGIES (ICOMET), 2019,
  • [29] A k-means based clustering algorithm
    Bloisi, Domenico Daniele
    Locchi, Luca
    COMPUTER VISION SYSTEMS, PROCEEDINGS, 2008, 5008 : 109 - 118
  • [30] MLK-means - A hybrid machine learning based k-means clustering algorithm for document clustering
    Perumal, Pitchandi
    Nedunchezhian, Raju
    International Journal of Computer Science Issues, 2012, 9 (5 5-2): : 164 - 173