An Analysis of Distributed Document Clustering Using MapReduce Based K-Means Algorithm

被引：0

作者：

Sardar T.H. ^{[1
]}

Ansari Z. ^{[2
]}

机构：

[1] School of Computer Science and Engineering, Jain University, Bengaluru

[2] P.A. College of Engineering, Mangaluru

来源：

Ansari, Zahid (zahid_cs@pace.edu.in) | 1600年 / Springer卷 / 101期

关键词：

Distributed computing; Document clustering; Hadoop; MapReduce; Parallel K-means;

D O I：

10.1007/s40031-020-00485-2

中图分类号：

学科分类号：

摘要：

Clustering is considered as one of the important data mining techniques. Document clustering is among many applications of clustering. The traditional clustering algorithms are proven inefficient for clustering rapidly generating large real world datasets. As a solution, traditional clustering algorithms are modified using distributed programming paradigm. MapReduce is a popular distributed programming paradigm designed for Hadoop distributed framework. This paper demonstrates a MapReduce based modification of K-Means clustering algorithm for document datasets. The result shows that the proposed algorithm is efficient than traditional K-Means for all size of document datasets clustering. The experiments also show that the MapReduce clustering works more efficiently when the dataset size and Hadoop cluster sizes are large. © 2020, The Institution of Engineers (India).

引用

页码：641 / 650

页数：9

共 50 条

[1] A hybrid MapReduce-based k-means clustering using genetic algorithm for distributed datasets
Ankita Sinha
Prasanta K. Jana
The Journal of Supercomputing, 2018, 74 : 1562 - 1579
[2] A hybrid MapReduce-based k-means clustering using genetic algorithm for distributed datasets
Sinha, Ankita
Jana, Prasanta K.
JOURNAL OF SUPERCOMPUTING, 2018, 74 (04): : 1562 - 1579
[3] K-means Clustering Optimization Algorithm Based on MapReduce
Li, Zhihua
Song, Xudong
Zhu, Wenhui
Chen, Yanxia
PROCEEDINGS OF THE 2015 INTERNATIONAL SYMPOSIUM ON COMPUTERS & INFORMATICS, 2015, 13 : 198 - 203
[4] A MapReduce-based K-means clustering algorithm
YiMin Mao
DeJin Gan
D. S. Mwakapesa
Y. A. Nanehkaran
Tao Tao
XueYu Huang
The Journal of Supercomputing, 2022, 78 : 5181 - 5202
[5] A MapReduce-based K-means clustering algorithm
Mao, YiMin
Gan, DeJin
Mwakapesa, D. S.
Nanehkaran, Y. A.
Tao, Tao
Huang, XueYu
JOURNAL OF SUPERCOMPUTING, 2022, 78 (04): : 5181 - 5202
[6] MapReduce Design of K-Means Clustering Algorithm
Anchalia, Prajesh P.
Koundinya, Anjan K.
Srinath, N. K.
2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA 2013), 2013,
[7] An Efficient K-means Clustering Algorithm on MapReduce
Li, Qiuhong
Wang, Peng
Wang, Wei
Hu, Hao
Li, Zhongsheng
Li, Junxian
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT I, 2014, 8421 : 357 - 371
[8] Pillar K-Means Clustering Algorithm Using MapReduce Framework
Ramdani, A. L.
Firmansyah, H. B.
INTERNATIONAL CONFERENCE ON SCIENCE, INFRASTRUCTURE TECHNOLOGY AND REGIONAL DEVELOPMENT, 2019, 258
[9] MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce
Akthar, Nadeem
Ahamad, Mohd Vasim
Ahmad, Shahbaaz
2016 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2016, : 192 - 198
[10] An Improved Sampling K-means Clustering Algorithm Based on MapReduce
Zhang Ya-ling
Wang Ya-nan
2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017,

← 1 2 3 4 5 →