A MapReduce-based parallel K-means clustering for large-scale CIM data verification

被引:10
|
作者
Deng, Chuang [1 ]
Liu, Yang [1 ]
Xu, Lixiong [1 ]
Yang, Jie [1 ]
Liu, Junyong [1 ]
Li, Siguang [3 ]
Li, Maozhen [2 ,3 ]
机构
[1] Sichuan Univ, Sch Elect Engn & Informat, Chengdu 610065, Peoples R China
[2] Brunel Univ London, Dept Elect & Comp Engn, Uxbridge UB8 3PH, Middx, England
[3] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Shanghai 200092, Peoples R China
来源
基金
美国国家科学基金会;
关键词
CIM verification; stochastic sampling; clustering; MapReduce; load balancing;
D O I
10.1002/cpe.3580
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The Common Information Model (CIM) has been heavily used in electric power grids for data exchange among a number of auxiliary systems such as communication systems, monitoring systems, and marketing systems. With a rapid deployment of digitalized devices in electric power networks, the volume of data continuously grows, which makes verification of CIM data a challenging issue. This paper presents a parallel K-means clustering algorithm for large-scale CIM data verification. The parallel K-means builds on the MapReduce computing model which has been widely taken up by the community in dealing with data-intensive applications. A genetic algorithm-based load-balancing scheme is designed to balance the workloads among the heterogeneous computing nodes for a further improvement in computation efficiency. The performance of the parallel K-means is initially evaluated in a small-scale in-house MapReduce cluster and subsequently evaluated in a commercial cloud computing platform. Finally, the parallel K-means is evaluated in large-scale simulated MapReduce environments. Both the experimental and simulation results show that the parallel K-means reduces the CIM data-verification time significantly compared with the sequential K-means clustering, while generating a high level of precision in data verification. Copyright (C) 2015 John Wiley & Sons, Ltd.
引用
下载
收藏
页码:3096 / 3114
页数:19
相关论文
共 50 条
  • [1] Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering
    Ansari Z.
    Afzal A.
    Sardar T.H.
    Journal of The Institution of Engineers (India): Series B, 2019, 100 (02) : 95 - 103
  • [2] A MapReduce-based K-means clustering algorithm
    YiMin Mao
    DeJin Gan
    D. S. Mwakapesa
    Y. A. Nanehkaran
    Tao Tao
    XueYu Huang
    The Journal of Supercomputing, 2022, 78 : 5181 - 5202
  • [3] A MapReduce-based K-means clustering algorithm
    Mao, YiMin
    Gan, DeJin
    Mwakapesa, D. S.
    Nanehkaran, Y. A.
    Tao, Tao
    Huang, XueYu
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (04): : 5181 - 5202
  • [4] An Efficient MapReduce-based Adaptive K-Means Clustering for Large Dataset
    Chowdhury, Tapan
    Mukherjee, Arijit
    Chakraborty, Susanta
    2017 3RD IEEE INTERNATIONAL SYMPOSIUM ON NANOELECTRONIC AND INFORMATION SYSTEMS (INIS), 2017, : 157 - 162
  • [5] Parallel K-Means Clustering Based on MapReduce
    Zhao, Weizhong
    Ma, Huifang
    He, Qing
    CLOUD COMPUTING, PROCEEDINGS, 2009, 5931 : 674 - 679
  • [6] MapReduce-based Dragonfly Algorithm for large-scale Data-Clustering
    Tripathi, Ashish Kumar
    Saxena, Pranav
    Gupta, Siddharth
    2019 FIFTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP 2019), 2019, : 171 - 175
  • [7] A MapReduce-based artificial bee colony for large-scale data clustering
    Banharnsakun, Anan
    PATTERN RECOGNITION LETTERS, 2017, 93 : 78 - 84
  • [8] Scalable k-means for large-scale clustering
    Ming, Yuewei
    Zhu, En
    Wang, Mao
    Liu, Qiang
    Liu, Xinwang
    Yin, Jianping
    INTELLIGENT DATA ANALYSIS, 2019, 23 (04) : 825 - 838
  • [9] Compressed K-Means for Large-Scale Clustering
    Shen, Xiaobo
    Liu, Weiwei
    Tsang, Ivor
    Shen, Fumin
    Sun, Quan-Sen
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2527 - 2533
  • [10] Practical Privacy-Preserving MapReduce Based K-Means Clustering Over Large-Scale Dataset
    Yuan, Jiawei
    Tian, Yifan
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2019, 7 (02) : 568 - 579