A MapReduce-based parallel K-means clustering for large-scale CIM data verification

被引:10
|
作者
Deng, Chuang [1 ]
Liu, Yang [1 ]
Xu, Lixiong [1 ]
Yang, Jie [1 ]
Liu, Junyong [1 ]
Li, Siguang [3 ]
Li, Maozhen [2 ,3 ]
机构
[1] Sichuan Univ, Sch Elect Engn & Informat, Chengdu 610065, Peoples R China
[2] Brunel Univ London, Dept Elect & Comp Engn, Uxbridge UB8 3PH, Middx, England
[3] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Shanghai 200092, Peoples R China
来源
基金
美国国家科学基金会;
关键词
CIM verification; stochastic sampling; clustering; MapReduce; load balancing;
D O I
10.1002/cpe.3580
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The Common Information Model (CIM) has been heavily used in electric power grids for data exchange among a number of auxiliary systems such as communication systems, monitoring systems, and marketing systems. With a rapid deployment of digitalized devices in electric power networks, the volume of data continuously grows, which makes verification of CIM data a challenging issue. This paper presents a parallel K-means clustering algorithm for large-scale CIM data verification. The parallel K-means builds on the MapReduce computing model which has been widely taken up by the community in dealing with data-intensive applications. A genetic algorithm-based load-balancing scheme is designed to balance the workloads among the heterogeneous computing nodes for a further improvement in computation efficiency. The performance of the parallel K-means is initially evaluated in a small-scale in-house MapReduce cluster and subsequently evaluated in a commercial cloud computing platform. Finally, the parallel K-means is evaluated in large-scale simulated MapReduce environments. Both the experimental and simulation results show that the parallel K-means reduces the CIM data-verification time significantly compared with the sequential K-means clustering, while generating a high level of precision in data verification. Copyright (C) 2015 John Wiley & Sons, Ltd.
引用
收藏
页码:3096 / 3114
页数:19
相关论文
共 50 条
  • [31] One-pass MapReduce-based clustering method for mixed large scale data
    Ben HajKacem, Mohamed Aymen
    Ben N'cir, Chiheb-Eddine
    Essoussi, Nadia
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2019, 52 (03) : 619 - 636
  • [32] One-pass MapReduce-based clustering method for mixed large scale data
    Mohamed Aymen Ben HajKacem
    Chiheb-Eddine Ben N’cir
    Nadia Essoussi
    Journal of Intelligent Information Systems, 2019, 52 : 619 - 636
  • [33] Optimized big data K-means clustering using MapReduce
    Cui, Xiaoli
    Zhu, Pingfei
    Yang, Xin
    Li, Keqiu
    Ji, Changqing
    JOURNAL OF SUPERCOMPUTING, 2014, 70 (03): : 1249 - 1259
  • [34] Efficient MapReduce Kernel k-Means for Big Data Clustering
    Tsapanos, Nikolaos
    Tefas, Anastasios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    9TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2016), 2016,
  • [35] Optimized big data K-means clustering using MapReduce
    Xiaoli Cui
    Pingfei Zhu
    Xin Yang
    Keqiu Li
    Changqing Ji
    The Journal of Supercomputing, 2014, 70 : 1249 - 1259
  • [36] Data decomposition for parallel K-means clustering
    Gursoy, A
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2004, 3019 : 241 - 248
  • [37] Regularized and Sparse Stochastic K-Means for Distributed Large-Scale Clustering
    Jumutc, Vilen
    Langone, Rocco
    Suykens, Johan A. K.
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2535 - 2540
  • [38] Fast K-means for Large Scale Clustering
    Hu, Qinghao
    Wu, Jiaxiang
    Bai, Lu
    Zhang, Yifan
    Cheng, Jian
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2099 - 2102
  • [39] MELT: Mapreduce-based Efficient Large-scale Trajectory Anonymization
    Ward, Katrina
    Lin, Dan
    Madria, Sanjay
    SSDBM 2017: 29TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2017,
  • [40] An Improved Sampling K-means Clustering Algorithm Based on MapReduce
    Zhang Ya-ling
    Wang Ya-nan
    2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017,