A MapReduce-based parallel K-means clustering for large-scale CIM data verification

被引：10

作者：

Deng, Chuang ^{[1
]}

Liu, Yang ^{[1
]}

Xu, Lixiong ^{[1
]}

Yang, Jie ^{[1
]}

Liu, Junyong ^{[1
]}

Li, Siguang ^{[3
]}

Li, Maozhen ^{[2
,3
]}

机构：

[1] Sichuan Univ, Sch Elect Engn & Informat, Chengdu 610065, Peoples R China

[2] Brunel Univ London, Dept Elect & Comp Engn, Uxbridge UB8 3PH, Middx, England

[3] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Shanghai 200092, Peoples R China

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2016年 / 28卷 / 11期

基金：

美国国家科学基金会;

关键词：

CIM verification; stochastic sampling; clustering; MapReduce; load balancing;

D O I：

10.1002/cpe.3580

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The Common Information Model (CIM) has been heavily used in electric power grids for data exchange among a number of auxiliary systems such as communication systems, monitoring systems, and marketing systems. With a rapid deployment of digitalized devices in electric power networks, the volume of data continuously grows, which makes verification of CIM data a challenging issue. This paper presents a parallel K-means clustering algorithm for large-scale CIM data verification. The parallel K-means builds on the MapReduce computing model which has been widely taken up by the community in dealing with data-intensive applications. A genetic algorithm-based load-balancing scheme is designed to balance the workloads among the heterogeneous computing nodes for a further improvement in computation efficiency. The performance of the parallel K-means is initially evaluated in a small-scale in-house MapReduce cluster and subsequently evaluated in a commercial cloud computing platform. Finally, the parallel K-means is evaluated in large-scale simulated MapReduce environments. Both the experimental and simulation results show that the parallel K-means reduces the CIM data-verification time significantly compared with the sequential K-means clustering, while generating a high level of precision in data verification. Copyright (C) 2015 John Wiley & Sons, Ltd.

引用

下载

页码：3096 / 3114

页数：19

共 50 条

[1] Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering
Ansari Z.
Afzal A.
Sardar T.H.
Journal of The Institution of Engineers (India): Series B, 2019, 100 (02) : 95 - 103
[2] A MapReduce-based K-means clustering algorithm
YiMin Mao
DeJin Gan
D. S. Mwakapesa
Y. A. Nanehkaran
Tao Tao
XueYu Huang
The Journal of Supercomputing, 2022, 78 : 5181 - 5202
[3] A MapReduce-based K-means clustering algorithm
Mao, YiMin
Gan, DeJin
Mwakapesa, D. S.
Nanehkaran, Y. A.
Tao, Tao
Huang, XueYu
JOURNAL OF SUPERCOMPUTING, 2022, 78 (04): : 5181 - 5202
[4] An Efficient MapReduce-based Adaptive K-Means Clustering for Large Dataset
Chowdhury, Tapan
Mukherjee, Arijit
Chakraborty, Susanta
2017 3RD IEEE INTERNATIONAL SYMPOSIUM ON NANOELECTRONIC AND INFORMATION SYSTEMS (INIS), 2017, : 157 - 162
[5] Parallel K-Means Clustering Based on MapReduce
Zhao, Weizhong
Ma, Huifang
He, Qing
CLOUD COMPUTING, PROCEEDINGS, 2009, 5931 : 674 - 679
[6] MapReduce-based Dragonfly Algorithm for large-scale Data-Clustering
Tripathi, Ashish Kumar
Saxena, Pranav
Gupta, Siddharth
2019 FIFTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP 2019), 2019, : 171 - 175
[7] A MapReduce-based artificial bee colony for large-scale data clustering
Banharnsakun, Anan
PATTERN RECOGNITION LETTERS, 2017, 93 : 78 - 84
[8] Scalable k-means for large-scale clustering
Ming, Yuewei
Zhu, En
Wang, Mao
Liu, Qiang
Liu, Xinwang
Yin, Jianping
INTELLIGENT DATA ANALYSIS, 2019, 23 (04) : 825 - 838
[9] Compressed K-Means for Large-Scale Clustering
Shen, Xiaobo
Liu, Weiwei
Tsang, Ivor
Shen, Fumin
Sun, Quan-Sen
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2527 - 2533
[10] Practical Privacy-Preserving MapReduce Based K-Means Clustering Over Large-Scale Dataset
Yuan, Jiawei
Tian, Yifan
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2019, 7 (02) : 568 - 579

← 1 2 3 4 5 →