A Novel MapReduce Based k-Means Clustering

被引：1

作者：

Sinha, Ankita ^{[1
]}

Jana, Prasanta K. ^{[1
]}

机构：

[1] Indian Sch Mines, Dept Comp Sci & Engn, Dhanbad, Bihar, India

来源：

PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND COMMUNICATION | 2017年 / 458卷

关键词：

Davies-Bouldin index; MapReduce; Clustering; k-Means; BIG DATA; ALGORITHMS;

D O I：

10.1007/978-981-10-2035-3_26

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data clustering is inevitable in today's era of data deluge. k-Means is a popular partition based clustering technique. However, with the increase in size and complexity of data, it is no longer suitable. There is an urgent need to shift towards parallel algorithms. We present a MapReduce based k-Means clustering, which is scalable and fault tolerant. The major advantage of our proposed work is that it dynamically determines the number of clusters, unlike k-Means where the final number of clusters has to be specified. MapReduce jobs are iteration sensitive as multiple read and write to the file system increase the cost as well as computation time. The algorithm proposed is not iterative one, it reads the data from and writes the output back to the file system once. We show that the proposed algorithm performs better than an Improved MapReduce based k-Means clustering algorithm.

引用

页码：247 / 255

页数：9

共 50 条

[41] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
Shi Na
Liu Xumin
Guan Yong
[J]. 2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
[42] A Novel Text Clustering Method Based on TGSOM and Fuzzy K-Means
Hu, Jinzhu
Xiong, Chunxiu
Shu, Jiangbo
Zhou, Xing
Zhu, Jun
[J]. PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL I, 2009, : 26 - 30
[43] A GENERALIZED k-MEANS PROBLEM FOR CLUSTERING AND AN ADMM-BASED k-MEANS ALGORITHM
Ling, Liyun
Gu, Yan
Zhang, Su
Wen, Jie
[J]. JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2024, 20 (06) : 2089 - 2115
[44] A MapReduce-based parallel K-means clustering for large-scale CIM data verification
Deng, Chuang
Liu, Yang
Xu, Lixiong
Yang, Jie
Liu, Junyong
Li, Siguang
Li, Maozhen
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (11): : 3096 - 3114
[45] A parallel k-means clustering algorithm based on redundance elimination and extreme points optimization employing MapReduce
Tang, Zhuo
Liu, Kunkun
Xiao, Jinbo
Yang, Li
Xiao, Zheng
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (20):
[46] An Improved Differential Privacy K-means Algorithm Based on MapReduce
Yao, Shunyuan
[J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2018, : 141 - 145
[47] Multipath Detection based on K-means Clustering
Savas, Caner
Dovis, Fabio
[J]. PROCEEDINGS OF THE 32ND INTERNATIONAL TECHNICAL MEETING OF THE SATELLITE DIVISION OF THE INSTITUTE OF NAVIGATION (ION GNSS+ 2019), 2019, : 3801 - 3811
[48] Rough Entropy Based k-Means Clustering
Malyszko, Dariusz
Stepaniuk, Jaroslaw
[J]. ROUGH SETS, FUZZY SETS, DATA MINING AND GRANULAR COMPUTING, PROCEEDINGS, 2009, 5908 : 406 - 413
[49] Distributed Clustering Based on K-means and CPGA
Zhou, Jun
Liu, Zhijing
[J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 444 - 447
[50] A Clustering Method Based on K-Means Algorithm
Li, Youguo
Wu, Haiyan
[J]. INTERNATIONAL CONFERENCE ON SOLID STATE DEVICES AND MATERIALS SCIENCE, 2012, 25 : 1104 - 1109

← 1 2 3 4 5 →