A Novel MapReduce Based k-Means Clustering

被引:1
|
作者
Sinha, Ankita [1 ]
Jana, Prasanta K. [1 ]
机构
[1] Indian Sch Mines, Dept Comp Sci & Engn, Dhanbad, Bihar, India
关键词
Davies-Bouldin index; MapReduce; Clustering; k-Means; BIG DATA; ALGORITHMS;
D O I
10.1007/978-981-10-2035-3_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data clustering is inevitable in today's era of data deluge. k-Means is a popular partition based clustering technique. However, with the increase in size and complexity of data, it is no longer suitable. There is an urgent need to shift towards parallel algorithms. We present a MapReduce based k-Means clustering, which is scalable and fault tolerant. The major advantage of our proposed work is that it dynamically determines the number of clusters, unlike k-Means where the final number of clusters has to be specified. MapReduce jobs are iteration sensitive as multiple read and write to the file system increase the cost as well as computation time. The algorithm proposed is not iterative one, it reads the data from and writes the output back to the file system once. We show that the proposed algorithm performs better than an Improved MapReduce based k-Means clustering algorithm.
引用
收藏
页码:247 / 255
页数:9
相关论文
共 50 条
  • [41] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
    Shi Na
    Liu Xumin
    Guan Yong
    [J]. 2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
  • [42] A Novel Text Clustering Method Based on TGSOM and Fuzzy K-Means
    Hu, Jinzhu
    Xiong, Chunxiu
    Shu, Jiangbo
    Zhou, Xing
    Zhu, Jun
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL I, 2009, : 26 - 30
  • [43] A GENERALIZED k-MEANS PROBLEM FOR CLUSTERING AND AN ADMM-BASED k-MEANS ALGORITHM
    Ling, Liyun
    Gu, Yan
    Zhang, Su
    Wen, Jie
    [J]. JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2024, 20 (06) : 2089 - 2115
  • [44] A MapReduce-based parallel K-means clustering for large-scale CIM data verification
    Deng, Chuang
    Liu, Yang
    Xu, Lixiong
    Yang, Jie
    Liu, Junyong
    Li, Siguang
    Li, Maozhen
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (11): : 3096 - 3114
  • [45] A parallel k-means clustering algorithm based on redundance elimination and extreme points optimization employing MapReduce
    Tang, Zhuo
    Liu, Kunkun
    Xiao, Jinbo
    Yang, Li
    Xiao, Zheng
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (20):
  • [46] An Improved Differential Privacy K-means Algorithm Based on MapReduce
    Yao, Shunyuan
    [J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2018, : 141 - 145
  • [47] Multipath Detection based on K-means Clustering
    Savas, Caner
    Dovis, Fabio
    [J]. PROCEEDINGS OF THE 32ND INTERNATIONAL TECHNICAL MEETING OF THE SATELLITE DIVISION OF THE INSTITUTE OF NAVIGATION (ION GNSS+ 2019), 2019, : 3801 - 3811
  • [48] Rough Entropy Based k-Means Clustering
    Malyszko, Dariusz
    Stepaniuk, Jaroslaw
    [J]. ROUGH SETS, FUZZY SETS, DATA MINING AND GRANULAR COMPUTING, PROCEEDINGS, 2009, 5908 : 406 - 413
  • [49] Distributed Clustering Based on K-means and CPGA
    Zhou, Jun
    Liu, Zhijing
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 444 - 447
  • [50] A Clustering Method Based on K-Means Algorithm
    Li, Youguo
    Wu, Haiyan
    [J]. INTERNATIONAL CONFERENCE ON SOLID STATE DEVICES AND MATERIALS SCIENCE, 2012, 25 : 1104 - 1109