A MapReduce-based parallel K-means clustering for large-scale CIM data verification

被引：10

作者：

Deng, Chuang ^{[1
]}

Liu, Yang ^{[1
]}

Xu, Lixiong ^{[1
]}

Yang, Jie ^{[1
]}

Liu, Junyong ^{[1
]}

Li, Siguang ^{[3
]}

Li, Maozhen ^{[2
,3
]}

机构：

[1] Sichuan Univ, Sch Elect Engn & Informat, Chengdu 610065, Peoples R China

[2] Brunel Univ London, Dept Elect & Comp Engn, Uxbridge UB8 3PH, Middx, England

[3] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Shanghai 200092, Peoples R China

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2016年 / 28卷 / 11期

基金：

美国国家科学基金会;

关键词：

CIM verification; stochastic sampling; clustering; MapReduce; load balancing;

D O I：

10.1002/cpe.3580

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The Common Information Model (CIM) has been heavily used in electric power grids for data exchange among a number of auxiliary systems such as communication systems, monitoring systems, and marketing systems. With a rapid deployment of digitalized devices in electric power networks, the volume of data continuously grows, which makes verification of CIM data a challenging issue. This paper presents a parallel K-means clustering algorithm for large-scale CIM data verification. The parallel K-means builds on the MapReduce computing model which has been widely taken up by the community in dealing with data-intensive applications. A genetic algorithm-based load-balancing scheme is designed to balance the workloads among the heterogeneous computing nodes for a further improvement in computation efficiency. The performance of the parallel K-means is initially evaluated in a small-scale in-house MapReduce cluster and subsequently evaluated in a commercial cloud computing platform. Finally, the parallel K-means is evaluated in large-scale simulated MapReduce environments. Both the experimental and simulation results show that the parallel K-means reduces the CIM data-verification time significantly compared with the sequential K-means clustering, while generating a high level of precision in data verification. Copyright (C) 2015 John Wiley & Sons, Ltd.

引用

页码：3096 / 3114

页数：19

共 50 条

[31] One-pass MapReduce-based clustering method for mixed large scale data
Ben HajKacem, Mohamed Aymen
Ben N'cir, Chiheb-Eddine
Essoussi, Nadia
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2019, 52 (03) : 619 - 636
[32] One-pass MapReduce-based clustering method for mixed large scale data
Mohamed Aymen Ben HajKacem
Chiheb-Eddine Ben N’cir
Nadia Essoussi
Journal of Intelligent Information Systems, 2019, 52 : 619 - 636
[33] Optimized big data K-means clustering using MapReduce
Cui, Xiaoli
Zhu, Pingfei
Yang, Xin
Li, Keqiu
Ji, Changqing
JOURNAL OF SUPERCOMPUTING, 2014, 70 (03): : 1249 - 1259
[34] Efficient MapReduce Kernel k-Means for Big Data Clustering
Tsapanos, Nikolaos
Tefas, Anastasios
Nikolaidis, Nikolaos
Pitas, Ioannis
9TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2016), 2016,
[35] Optimized big data K-means clustering using MapReduce
Xiaoli Cui
Pingfei Zhu
Xin Yang
Keqiu Li
Changqing Ji
The Journal of Supercomputing, 2014, 70 : 1249 - 1259
[36] Data decomposition for parallel K-means clustering
Gursoy, A
PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2004, 3019 : 241 - 248
[37] Regularized and Sparse Stochastic K-Means for Distributed Large-Scale Clustering
Jumutc, Vilen
Langone, Rocco
Suykens, Johan A. K.
PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2535 - 2540
[38] Fast K-means for Large Scale Clustering
Hu, Qinghao
Wu, Jiaxiang
Bai, Lu
Zhang, Yifan
Cheng, Jian
CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2099 - 2102
[39] MELT: Mapreduce-based Efficient Large-scale Trajectory Anonymization
Ward, Katrina
Lin, Dan
Madria, Sanjay
SSDBM 2017: 29TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2017,
[40] An Improved Sampling K-means Clustering Algorithm Based on MapReduce
Zhang Ya-ling
Wang Ya-nan
2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017,

← 1 2 3 4 5 →