An Efficient K-means Clustering Algorithm on MapReduce

被引:0
|
作者
Li, Qiuhong [1 ]
Wang, Peng [1 ]
Wang, Wei [1 ]
Hu, Hao [1 ]
Li, Zhongsheng
Li, Junxian [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As an important approach to analyze the massive data set, an efficient k-means implementation on MapReduce is crucial in many applications. In this paper we propose a series of strategies to improve the efficiency of k-means for massive high-dimensional data points on MapReduce. First, we use locality sensitive hashing (LSH) to map data points into buckets, based on which, the original data points is converted into the weighted representative points as well as the outlier points. Then an effective center initialization algorithm is proposed, which can achieve higher quality of the initial centers. Finally, a pruning strategy is proposed to speed up the iteration process by pruning the unnecessary distance computation between centers and data points. An extensive empirical study shows that the proposed techniques can improve both efficiency and accuracy of k-means on MapReduce greatly.
引用
收藏
页码:357 / 371
页数:15
相关论文
共 50 条
  • [31] AN EFFICIENT K-MEANS CLUSTERING INITIALIZATION USING OPTIMIZATION ALGORITHM
    Divya, V.
    Deepika, R.
    Yamini, C.
    Sobiyaa, P.
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATION ENGINEERING (ICACCE-2019), 2019,
  • [32] K-means Clustering: An Efficient Algorithm for Protein Complex Detection
    Kalaivani, S.
    Ramyachitra, D.
    Manikandan, P.
    [J]. PROGRESS IN COMPUTING, ANALYTICS AND NETWORKING, ICCAN 2017, 2018, 710 : 449 - 459
  • [33] An efficient k-means clustering algorithm using simple partitioning
    Hung, MC
    Wu, JP
    Chang, JH
    Yang, DL
    [J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2005, 21 (06) : 1157 - 1177
  • [34] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
    Li, Yongyi
    Yang, Zhongqiang
    Han, Kaixu
    [J]. Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
  • [35] GK-means: An Efficient K-means Clustering Algorithm Based On Grid
    Chen, Xiaoyun
    Su, Youli
    Chen, Yi
    Liu, Guohua
    [J]. 2009 INTERNATIONAL SYMPOSIUM ON COMPUTER NETWORK AND MULTIMEDIA TECHNOLOGY (CNMT 2009), VOLUMES 1 AND 2, 2009, : 531 - 534
  • [36] Unsupervised K-Means Clustering Algorithm
    Sinaga, Kristina P.
    Yang, Miin-Shen
    [J]. IEEE ACCESS, 2020, 8 : 80716 - 80727
  • [37] Granular K-means Clustering Algorithm
    Zhou, Chenglong
    Chen, Yuming
    Zhu, Yidong
    [J]. Computer Engineering and Applications, 2023, 59 (13) : 317 - 324
  • [38] An Improved K-means Clustering Algorithm
    Wang Yintong
    Li Wanlong
    Gao Rujia
    [J]. 2012 WORLD AUTOMATION CONGRESS (WAC), 2012,
  • [39] The MinMax k-Means clustering algorithm
    Tzortzis, Grigorios
    Likas, Aristidis
    [J]. PATTERN RECOGNITION, 2014, 47 (07) : 2505 - 2516
  • [40] Modified K-means clustering algorithm
    Li, Wei
    [J]. CISP 2008: FIRST INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOL 4, PROCEEDINGS, 2008, : 618 - 621