An Analysis of Distributed Document Clustering Using MapReduce Based K-Means Algorithm

被引：0

作者：

Sardar T.H. ^{[1
]}

Ansari Z. ^{[2
]}

机构：

[1] School of Computer Science and Engineering, Jain University, Bengaluru

[2] P.A. College of Engineering, Mangaluru

来源：

Ansari, Zahid (zahid_cs@pace.edu.in) | 1600年 / Springer卷 / 101期

关键词：

Distributed computing; Document clustering; Hadoop; MapReduce; Parallel K-means;

D O I：

10.1007/s40031-020-00485-2

中图分类号：

学科分类号：

摘要：

Clustering is considered as one of the important data mining techniques. Document clustering is among many applications of clustering. The traditional clustering algorithms are proven inefficient for clustering rapidly generating large real world datasets. As a solution, traditional clustering algorithms are modified using distributed programming paradigm. MapReduce is a popular distributed programming paradigm designed for Hadoop distributed framework. This paper demonstrates a MapReduce based modification of K-Means clustering algorithm for document datasets. The result shows that the proposed algorithm is efficient than traditional K-Means for all size of document datasets clustering. The experiments also show that the MapReduce clustering works more efficiently when the dataset size and Hadoop cluster sizes are large. © 2020, The Institution of Engineers (India).

引用

页码：641 / 650

页数：9

共 50 条

[21] NEW ALGORITHM FOR CLUSTERING DISTRIBUTED DATA USING K-MEANS
Khedr, Ahmed M.
Bhatnagar, Raj K.
COMPUTING AND INFORMATICS, 2014, 33 (04) : 943 - 964
[22] Optimization of the Distributed K-means Clustering Algorithm Based on Set Pair Analysis
Ling, Song
Qi Yunfeng
2015 8th International Congress on Image and Signal Processing (CISP), 2015, : 1593 - 1598
[23] An Optimal Distributed K-Means Clustering Algorithm Based on CloudStack
Mao, Yingchi
Xu, Ziyang
Li, Xiaofang
Ping, Ping
2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 3149 - 3156
[24] An Optimal Distributed K-Means Clustering Algorithm Based on CloudStack
Mao, Yingchi
Xu, Ziyang
Ping, Ping
Wang, Longbao
2015 NINTH INTERNATIONAL CONFERENCE ON FRONTIER OF COMPUTER SCIENCE AND TECHNOLOGY FCST 2015, 2015, : 386 - 391
[25] An Effective and Efficient Clustering Based on K-Means Using MapReduce and TLBO
Pedireddla, Praveen Kumar
Yadwad, Sunita A.
PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 3, 2016, 381 : 619 - 628
[26] An Efficient Data Structure for Document Clustering Using K-Means Algorithm
Killani, Ramanji
Satapathy, Suresh Chandra
Sowjanya, A. M.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 337 - +
[27] MapReduce-based Fuzzy C-means Algorithm for Distributed Document Clustering
Sardar T.H.
Ansari Z.
Journal of The Institution of Engineers (India): Series B, 2022, 103 (01) : 131 - 142
[28] Document Clustering - A Feasible Demonstration with K-means Algorithm
Arif, Wajiha
Mahoto, Naeem Ahmed
2019 2ND INTERNATIONAL CONFERENCE ON COMPUTING, MATHEMATICS AND ENGINEERING TECHNOLOGIES (ICOMET), 2019,
[29] A k-means based clustering algorithm
Bloisi, Domenico Daniele
Locchi, Luca
COMPUTER VISION SYSTEMS, PROCEEDINGS, 2008, 5008 : 109 - 118
[30] MLK-means - A hybrid machine learning based k-means clustering algorithm for document clustering
Perumal, Pitchandi
Nedunchezhian, Raju
International Journal of Computer Science Issues, 2012, 9 (5 5-2): : 164 - 173

← 1 2 3 4 5 →