A Novel MapReduce Based k-Means Clustering

被引：1

作者：

Sinha, Ankita ^{[1
]}

Jana, Prasanta K. ^{[1
]}

机构：

[1] Indian Sch Mines, Dept Comp Sci & Engn, Dhanbad, Bihar, India

来源：

PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND COMMUNICATION | 2017年 / 458卷

关键词：

Davies-Bouldin index; MapReduce; Clustering; k-Means; BIG DATA; ALGORITHMS;

D O I：

10.1007/978-981-10-2035-3_26

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data clustering is inevitable in today's era of data deluge. k-Means is a popular partition based clustering technique. However, with the increase in size and complexity of data, it is no longer suitable. There is an urgent need to shift towards parallel algorithms. We present a MapReduce based k-Means clustering, which is scalable and fault tolerant. The major advantage of our proposed work is that it dynamically determines the number of clusters, unlike k-Means where the final number of clusters has to be specified. MapReduce jobs are iteration sensitive as multiple read and write to the file system increase the cost as well as computation time. The algorithm proposed is not iterative one, it reads the data from and writes the output back to the file system once. We show that the proposed algorithm performs better than an Improved MapReduce based k-Means clustering algorithm.

引用

页码：247 / 255

页数：9

共 50 条

[1] Parallel K-Means Clustering Based on MapReduce
Zhao, Weizhong
Ma, Huifang
He, Qing
[J]. CLOUD COMPUTING, PROCEEDINGS, 2009, 5931 : 674 - 679
[2] K-means Clustering Optimization Algorithm Based on MapReduce
Li, Zhihua
Song, Xudong
Zhu, Wenhui
Chen, Yanxia
[J]. PROCEEDINGS OF THE 2015 INTERNATIONAL SYMPOSIUM ON COMPUTERS & INFORMATICS, 2015, 13 : 198 - 203
[3] A MapReduce-based K-means clustering algorithm
YiMin Mao
DeJin Gan
D. S. Mwakapesa
Y. A. Nanehkaran
Tao Tao
XueYu Huang
[J]. The Journal of Supercomputing, 2022, 78 : 5181 - 5202
[4] A MapReduce-based K-means clustering algorithm
Mao, YiMin
Gan, DeJin
Mwakapesa, D. S.
Nanehkaran, Y. A.
Tao, Tao
Huang, XueYu
[J]. JOURNAL OF SUPERCOMPUTING, 2022, 78 (04): : 5181 - 5202
[5] An Improved Sampling K-means Clustering Algorithm Based on MapReduce
Zhang Ya-ling
Wang Ya-nan
[J]. 2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017,
[6] MapReduce Design of K-Means Clustering Algorithm
Anchalia, Prajesh P.
Koundinya, Anjan K.
Srinath, N. K.
[J]. 2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA 2013), 2013,
[7] An Efficient K-means Clustering Algorithm on MapReduce
Li, Qiuhong
Wang, Peng
Wang, Wei
Hu, Hao
Li, Zhongsheng
Li, Junxian
[J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT I, 2014, 8421 : 357 - 371
[8] Design of K-means clustering algorithm in PGAS based Mapreduce framework
Shomanov, A. S.
Mansurova, M. E.
Nugumanova, A. B.
[J]. 2018 IEEE 12TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT), 2018, : 158 - 160
[9] Parallel K-Means Clustering of Remote Sensing Images Based on MapReduce
Lv, Zhenhua
Hu, Yingjie
Zhong, Haidong
Wu, Jianping
Li, Bo
Zhao, Hui
[J]. WEB INFORMATION SYSTEMS AND MINING, 2010, 6318 : 162 - +
[10] An Effective and Efficient Clustering Based on K-Means Using MapReduce and TLBO
Pedireddla, Praveen Kumar
Yadwad, Sunita A.
[J]. PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 3, 2016, 381 : 619 - 628

← 1 2 3 4 5 →