An Analysis of Distributed Document Clustering Using MapReduce Based K-Means Algorithm

被引：0

作者：

Sardar T.H. ^{[1
]}

Ansari Z. ^{[2
]}

机构：

[1] School of Computer Science and Engineering, Jain University, Bengaluru

[2] P.A. College of Engineering, Mangaluru

来源：

Ansari, Zahid (zahid_cs@pace.edu.in) | 1600年 / Springer卷 / 101期

关键词：

Distributed computing; Document clustering; Hadoop; MapReduce; Parallel K-means;

D O I：

10.1007/s40031-020-00485-2

中图分类号：

学科分类号：

摘要：

Clustering is considered as one of the important data mining techniques. Document clustering is among many applications of clustering. The traditional clustering algorithms are proven inefficient for clustering rapidly generating large real world datasets. As a solution, traditional clustering algorithms are modified using distributed programming paradigm. MapReduce is a popular distributed programming paradigm designed for Hadoop distributed framework. This paper demonstrates a MapReduce based modification of K-Means clustering algorithm for document datasets. The result shows that the proposed algorithm is efficient than traditional K-Means for all size of document datasets clustering. The experiments also show that the MapReduce clustering works more efficiently when the dataset size and Hadoop cluster sizes are large. © 2020, The Institution of Engineers (India).

引用

页码：641 / 650

页数：9

共 50 条

[31] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
Li, Yongyi
Yang, Zhongqiang
Han, Kaixu
Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
[32] Distributed Algorithm for Text Documents Clustering Based on k-Means Approach
Sarnovsky, Martin
Carnoka, Noema
INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, ISAT 2015, PT II, 2016, 430 : 165 - 174
[33] An Improved K-means Algorithm based on Mapreduce and Grid
Ma, Li
Gu, Lei
Li, Bo
Ma, Yue
Wang, Jin
INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2015, 8 (01): : 189 - 199
[34] Optimized big data K-means clustering using MapReduce
Cui, Xiaoli
Zhu, Pingfei
Yang, Xin
Li, Keqiu
Ji, Changqing
JOURNAL OF SUPERCOMPUTING, 2014, 70 (03): : 1249 - 1259
[35] Optimized big data K-means clustering using MapReduce
Xiaoli Cui
Pingfei Zhu
Xin Yang
Keqiu Li
Changqing Ji
The Journal of Supercomputing, 2014, 70 : 1249 - 1259
[36] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
Shi Na
Liu Xumin
Guan Yong
2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
[37] Distributed Clustering Based on K-means and CPGA
Zhou, Jun
Liu, Zhijing
FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 444 - 447
[38] K-means algorithm based on particle swarm optimization for web document clustering
Xiao, L. Z.
Shao, Z. Q.
Gu, X. M.
DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2006, 13E : 980 - 984
[39] An Improved Hierarchical K-Means Algorithm for Web Document Clustering
Liu, Yongxin
Liu, Zhijng
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 606 - 610
[40] Text Document Clustering Based on Density K-means
Wu, Di
Zeng, Yan
Qu, Yin-chuan
INTERNATIONAL CONFERENCE ON COMPUTER, MECHATRONICS AND ELECTRONIC ENGINEERING (CMEE 2016), 2016,

← 1 2 3 4 5 →