Optimized big data K-means clustering using MapReduce

被引：94

作者：

Cui, Xiaoli ^{[1
]}

Zhu, Pingfei ^{[2
]}

Yang, Xin ^{[1
]}

Li, Keqiu ^{[1
]}

Ji, Changqing ^{[3
]}

机构：

[1] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116024, Peoples R China

[2] Beijing China Power Informat Technol Co Ltd, State Grid Elect Power Res Inst, Beijing 100192, Peoples R China

[3] Dalian Univ, Sch Phys Sci & Technol, Dalian 116600, Peoples R China

来源：

JOURNAL OF SUPERCOMPUTING | 2014年 / 70卷 / 03期

基金：

美国国家科学基金会;

关键词：

K-means; MapReduce; Sampling; Performance;

D O I：

10.1007/s11227-014-1225-7

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Clustering analysis is one of the most commonly used data processing algorithms. Over half a century, K-means remains the most popular clustering algorithm because of its simplicity. Recently, as data volume continues to rise, some researchers turn to MapReduce to get high performance. However, MapReduce is unsuitable for iterated algorithms owing to repeated times of restarting jobs, big data reading and shuffling. In this paper, we address the problems of processing large-scale data using K-means clustering algorithm and propose a novel processing model in MapReduce to eliminate the iteration dependence and obtain high performance. We analyze and implement our idea. Extensive experiments on our cluster demonstrate that our proposed methods are efficient, robust and scalable.

引用

页码：1249 / 1259

页数：11

共 50 条

[1] Optimized big data K-means clustering using MapReduce
Xiaoli Cui
Pingfei Zhu
Xin Yang
Keqiu Li
Changqing Ji
[J]. The Journal of Supercomputing, 2014, 70 : 1249 - 1259
[2] Efficient MapReduce Kernel k-Means for Big Data Clustering
Tsapanos, Nikolaos
Tefas, Anastasios
Nikolaidis, Nikolaos
Pitas, Ioannis
[J]. 9TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2016), 2016,
[3] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
Li, Yongyi
Yang, Zhongqiang
Han, Kaixu
[J]. Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
[4] k-Means Clustering of Lines for Big Data
Marom, Yair
Feldman, Dan
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[5] An optimized SVM-RFE based feature selection and weighted entropy K-means approach for big data clustering in mapreduce
Madan, Suman
Komalavalli, C.
Bhatia, Manjot Kaur
Laroiya, Chetna
Arora, Monika
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (30) : 74233 - 74254
[6] MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce
Akthar, Nadeem
Ahamad, Mohd Vasim
Ahmad, Shahbaaz
[J]. 2016 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2016, : 192 - 198
[7] Optimized data fusion for K-means Laplacian clustering
Yu, Shi
Liu, Xinhai
Tranchevent, Leon-Charles
Glanzel, Wolfgang
Suykens, Johan A. K.
De Moor, Bart
Moreau, Yves
[J]. BIOINFORMATICS, 2011, 27 (01) : 118 - 126
[8] Optimized Data Fusion for Kernel k-Means Clustering
Yu, Shi
Tranchevent, Leon-Charles
Liu, Xinhai
Glanzel, Wolfgang
Suykens, Johan A. K.
De Moor, Bart
Moreau, Yves
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (05) : 1031 - 1039
[9] Pillar K-Means Clustering Algorithm Using MapReduce Framework
Ramdani, A. L.
Firmansyah, H. B.
[J]. INTERNATIONAL CONFERENCE ON SCIENCE, INFRASTRUCTURE TECHNOLOGY AND REGIONAL DEVELOPMENT, 2019, 258
[10] Cloud Based K-Means Clustering Running as a MapReduce Job for Big Data Healthcare Analytics Using Apache Mahout
Rallapalli, Sreekanth
Gondkar, R. R.
Rao, Golajapu Venu Madhava
[J]. INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 1, INDIA 2016, 2016, 433 : 127 - 135

← 1 2 3 4 5 →