A MapReduce-based parallel K-means clustering for large-scale CIM data verification

被引:10
|
作者
Deng, Chuang [1 ]
Liu, Yang [1 ]
Xu, Lixiong [1 ]
Yang, Jie [1 ]
Liu, Junyong [1 ]
Li, Siguang [3 ]
Li, Maozhen [2 ,3 ]
机构
[1] Sichuan Univ, Sch Elect Engn & Informat, Chengdu 610065, Peoples R China
[2] Brunel Univ London, Dept Elect & Comp Engn, Uxbridge UB8 3PH, Middx, England
[3] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Shanghai 200092, Peoples R China
来源
基金
美国国家科学基金会;
关键词
CIM verification; stochastic sampling; clustering; MapReduce; load balancing;
D O I
10.1002/cpe.3580
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The Common Information Model (CIM) has been heavily used in electric power grids for data exchange among a number of auxiliary systems such as communication systems, monitoring systems, and marketing systems. With a rapid deployment of digitalized devices in electric power networks, the volume of data continuously grows, which makes verification of CIM data a challenging issue. This paper presents a parallel K-means clustering algorithm for large-scale CIM data verification. The parallel K-means builds on the MapReduce computing model which has been widely taken up by the community in dealing with data-intensive applications. A genetic algorithm-based load-balancing scheme is designed to balance the workloads among the heterogeneous computing nodes for a further improvement in computation efficiency. The performance of the parallel K-means is initially evaluated in a small-scale in-house MapReduce cluster and subsequently evaluated in a commercial cloud computing platform. Finally, the parallel K-means is evaluated in large-scale simulated MapReduce environments. Both the experimental and simulation results show that the parallel K-means reduces the CIM data-verification time significantly compared with the sequential K-means clustering, while generating a high level of precision in data verification. Copyright (C) 2015 John Wiley & Sons, Ltd.
引用
收藏
页码:3096 / 3114
页数:19
相关论文
共 50 条
  • [41] MapReduce-based K-Prototypes Clustering Method for Big Data
    Ben HajKacem, Mohamed Aymen
    Ben N'cir, Chiheb-Eddine
    Essoussi, Nadia
    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 1030 - 1036
  • [42] MapReduce Design of K-Means Clustering Algorithm
    Anchalia, Prajesh P.
    Koundinya, Anjan K.
    Srinath, N. K.
    2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA 2013), 2013,
  • [43] An Efficient K-means Clustering Algorithm on MapReduce
    Li, Qiuhong
    Wang, Peng
    Wang, Wei
    Hu, Hao
    Li, Zhongsheng
    Li, Junxian
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT I, 2014, 8421 : 357 - 371
  • [44] K-means Clustering Algorithm for Large-scale Chinese Commodity Information Web Based on Hadoop
    Geng Yushui
    Zhang Lishuo
    14TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS, ENGINEERING AND SCIENCE (DCABES 2015), 2015, : 256 - 259
  • [45] A Semantic Partition Algorithm Based on Improved K-Means Clustering for Large-Scale Indoor Areas
    Shi, Kegong
    Yan, Jinjin
    Yang, Jinquan
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2024, 13 (02)
  • [46] Optimal Operation of Large-scale Electric Vehicles Based on Improved K-means Clustering Algorithm
    Liu, Jian
    Xu, Weifeng
    Liu, Zhijun
    Fu, Guanhua
    Jiang, Yunpeng
    Zhao, Ergang
    PROCEEDINGS OF 2022 5TH INTERNATIONAL CONFERENCE ON ROBOT SYSTEMS AND APPLICATIONS, ICRSA2022, 2022, : 23 - 28
  • [47] Efficient adaptive large-scale text clustering method based on genetic K-means algorithm
    Dai, Wenhua
    Jiao, Cuizhen
    He, Tingting
    RECENT ADVANCE OF CHINESE COMPUTING TECHNOLOGIES, 2007, : 281 - 285
  • [48] Extractive Text Summarization on Large-scale Dataset Using K-Means Clustering
    Ti-Hon Nguyen
    Thanh-Nghi Do
    ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: THEORY AND PRACTICES IN ARTIFICIAL INTELLIGENCE, 2022, 13343 : 737 - 746
  • [49] Large-scale k-means clustering with user-centric privacy preservation
    Sakuma, Jun
    Kobayashi, Shigenobu
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2008, 5012 : 320 - 332
  • [50] Parallel clustering over large-scale data stream based on grid density using Hadoop MapReduce
    Cai, Binlei
    Zhu, Shiwei
    Guo, Qin
    Yu, Junfeng
    ICIC Express Letters, 2013, 7 (11): : 3075 - 3081