An Efficient K-means Clustering Algorithm on MapReduce

被引:0
|
作者
Li, Qiuhong [1 ]
Wang, Peng [1 ]
Wang, Wei [1 ]
Hu, Hao [1 ]
Li, Zhongsheng
Li, Junxian [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As an important approach to analyze the massive data set, an efficient k-means implementation on MapReduce is crucial in many applications. In this paper we propose a series of strategies to improve the efficiency of k-means for massive high-dimensional data points on MapReduce. First, we use locality sensitive hashing (LSH) to map data points into buckets, based on which, the original data points is converted into the weighted representative points as well as the outlier points. Then an effective center initialization algorithm is proposed, which can achieve higher quality of the initial centers. Finally, a pruning strategy is proposed to speed up the iteration process by pruning the unnecessary distance computation between centers and data points. An extensive empirical study shows that the proposed techniques can improve both efficiency and accuracy of k-means on MapReduce greatly.
引用
收藏
页码:357 / 371
页数:15
相关论文
共 50 条
  • [1] MapReduce Design of K-Means Clustering Algorithm
    Anchalia, Prajesh P.
    Koundinya, Anjan K.
    Srinath, N. K.
    [J]. 2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA 2013), 2013,
  • [2] An Improved parallel K-means Clustering Algorithm with MapReduce
    Liao, Qing
    Yang, Fan
    Zhao, Jingming
    [J]. 2013 15TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT), 2013, : 764 - 768
  • [3] K-means Clustering Optimization Algorithm Based on MapReduce
    Li, Zhihua
    Song, Xudong
    Zhu, Wenhui
    Chen, Yanxia
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL SYMPOSIUM ON COMPUTERS & INFORMATICS, 2015, 13 : 198 - 203
  • [4] Improved MapReduce k-Means Clustering Algorithm with Combiner
    Anchalia, Prajesh P.
    [J]. 2014 UKSIM-AMSS 16TH INTERNATIONAL CONFERENCE ON COMPUTER MODELLING AND SIMULATION (UKSIM), 2014, : 386 - 391
  • [5] A MapReduce-based K-means clustering algorithm
    YiMin Mao
    DeJin Gan
    D. S. Mwakapesa
    Y. A. Nanehkaran
    Tao Tao
    XueYu Huang
    [J]. The Journal of Supercomputing, 2022, 78 : 5181 - 5202
  • [6] A MapReduce-based K-means clustering algorithm
    Mao, YiMin
    Gan, DeJin
    Mwakapesa, D. S.
    Nanehkaran, Y. A.
    Tao, Tao
    Huang, XueYu
    [J]. JOURNAL OF SUPERCOMPUTING, 2022, 78 (04): : 5181 - 5202
  • [7] MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce
    Akthar, Nadeem
    Ahamad, Mohd Vasim
    Ahmad, Shahbaaz
    [J]. 2016 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2016, : 192 - 198
  • [8] An Improved Sampling K-means Clustering Algorithm Based on MapReduce
    Zhang Ya-ling
    Wang Ya-nan
    [J]. 2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017,
  • [9] Efficient MapReduce Kernel k-Means for Big Data Clustering
    Tsapanos, Nikolaos
    Tefas, Anastasios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    [J]. 9TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2016), 2016,
  • [10] Pillar K-Means Clustering Algorithm Using MapReduce Framework
    Ramdani, A. L.
    Firmansyah, H. B.
    [J]. INTERNATIONAL CONFERENCE ON SCIENCE, INFRASTRUCTURE TECHNOLOGY AND REGIONAL DEVELOPMENT, 2019, 258