Optimized big data K-means clustering using MapReduce

被引:94
|
作者
Cui, Xiaoli [1 ]
Zhu, Pingfei [2 ]
Yang, Xin [1 ]
Li, Keqiu [1 ]
Ji, Changqing [3 ]
机构
[1] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116024, Peoples R China
[2] Beijing China Power Informat Technol Co Ltd, State Grid Elect Power Res Inst, Beijing 100192, Peoples R China
[3] Dalian Univ, Sch Phys Sci & Technol, Dalian 116600, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2014年 / 70卷 / 03期
基金
美国国家科学基金会;
关键词
K-means; MapReduce; Sampling; Performance;
D O I
10.1007/s11227-014-1225-7
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering analysis is one of the most commonly used data processing algorithms. Over half a century, K-means remains the most popular clustering algorithm because of its simplicity. Recently, as data volume continues to rise, some researchers turn to MapReduce to get high performance. However, MapReduce is unsuitable for iterated algorithms owing to repeated times of restarting jobs, big data reading and shuffling. In this paper, we address the problems of processing large-scale data using K-means clustering algorithm and propose a novel processing model in MapReduce to eliminate the iteration dependence and obtain high performance. We analyze and implement our idea. Extensive experiments on our cluster demonstrate that our proposed methods are efficient, robust and scalable.
引用
收藏
页码:1249 / 1259
页数:11
相关论文
共 50 条
  • [1] Optimized big data K-means clustering using MapReduce
    Xiaoli Cui
    Pingfei Zhu
    Xin Yang
    Keqiu Li
    Changqing Ji
    [J]. The Journal of Supercomputing, 2014, 70 : 1249 - 1259
  • [2] Efficient MapReduce Kernel k-Means for Big Data Clustering
    Tsapanos, Nikolaos
    Tefas, Anastasios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    [J]. 9TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2016), 2016,
  • [3] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
    Li, Yongyi
    Yang, Zhongqiang
    Han, Kaixu
    [J]. Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
  • [4] k-Means Clustering of Lines for Big Data
    Marom, Yair
    Feldman, Dan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [5] An optimized SVM-RFE based feature selection and weighted entropy K-means approach for big data clustering in mapreduce
    Madan, Suman
    Komalavalli, C.
    Bhatia, Manjot Kaur
    Laroiya, Chetna
    Arora, Monika
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (30) : 74233 - 74254
  • [6] MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce
    Akthar, Nadeem
    Ahamad, Mohd Vasim
    Ahmad, Shahbaaz
    [J]. 2016 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2016, : 192 - 198
  • [7] Optimized data fusion for K-means Laplacian clustering
    Yu, Shi
    Liu, Xinhai
    Tranchevent, Leon-Charles
    Glanzel, Wolfgang
    Suykens, Johan A. K.
    De Moor, Bart
    Moreau, Yves
    [J]. BIOINFORMATICS, 2011, 27 (01) : 118 - 126
  • [8] Optimized Data Fusion for Kernel k-Means Clustering
    Yu, Shi
    Tranchevent, Leon-Charles
    Liu, Xinhai
    Glanzel, Wolfgang
    Suykens, Johan A. K.
    De Moor, Bart
    Moreau, Yves
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (05) : 1031 - 1039
  • [9] Pillar K-Means Clustering Algorithm Using MapReduce Framework
    Ramdani, A. L.
    Firmansyah, H. B.
    [J]. INTERNATIONAL CONFERENCE ON SCIENCE, INFRASTRUCTURE TECHNOLOGY AND REGIONAL DEVELOPMENT, 2019, 258
  • [10] Cloud Based K-Means Clustering Running as a MapReduce Job for Big Data Healthcare Analytics Using Apache Mahout
    Rallapalli, Sreekanth
    Gondkar, R. R.
    Rao, Golajapu Venu Madhava
    [J]. INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 1, INDIA 2016, 2016, 433 : 127 - 135