A MapReduce-based parallel K-means clustering for large-scale CIM data verification

被引：10

作者：

Deng, Chuang ^{[1
]}

Liu, Yang ^{[1
]}

Xu, Lixiong ^{[1
]}

Yang, Jie ^{[1
]}

Liu, Junyong ^{[1
]}

Li, Siguang ^{[3
]}

Li, Maozhen ^{[2
,3
]}

机构：

[1] Sichuan Univ, Sch Elect Engn & Informat, Chengdu 610065, Peoples R China

[2] Brunel Univ London, Dept Elect & Comp Engn, Uxbridge UB8 3PH, Middx, England

[3] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Shanghai 200092, Peoples R China

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2016年 / 28卷 / 11期

基金：

美国国家科学基金会;

关键词：

CIM verification; stochastic sampling; clustering; MapReduce; load balancing;

D O I：

10.1002/cpe.3580

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The Common Information Model (CIM) has been heavily used in electric power grids for data exchange among a number of auxiliary systems such as communication systems, monitoring systems, and marketing systems. With a rapid deployment of digitalized devices in electric power networks, the volume of data continuously grows, which makes verification of CIM data a challenging issue. This paper presents a parallel K-means clustering algorithm for large-scale CIM data verification. The parallel K-means builds on the MapReduce computing model which has been widely taken up by the community in dealing with data-intensive applications. A genetic algorithm-based load-balancing scheme is designed to balance the workloads among the heterogeneous computing nodes for a further improvement in computation efficiency. The performance of the parallel K-means is initially evaluated in a small-scale in-house MapReduce cluster and subsequently evaluated in a commercial cloud computing platform. Finally, the parallel K-means is evaluated in large-scale simulated MapReduce environments. Both the experimental and simulation results show that the parallel K-means reduces the CIM data-verification time significantly compared with the sequential K-means clustering, while generating a high level of precision in data verification. Copyright (C) 2015 John Wiley & Sons, Ltd.

引用

页码：3096 / 3114

页数：19

共 50 条

[41] MapReduce-based K-Prototypes Clustering Method for Big Data
Ben HajKacem, Mohamed Aymen
Ben N'cir, Chiheb-Eddine
Essoussi, Nadia
PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 1030 - 1036
[42] MapReduce Design of K-Means Clustering Algorithm
Anchalia, Prajesh P.
Koundinya, Anjan K.
Srinath, N. K.
2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA 2013), 2013,
[43] An Efficient K-means Clustering Algorithm on MapReduce
Li, Qiuhong
Wang, Peng
Wang, Wei
Hu, Hao
Li, Zhongsheng
Li, Junxian
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT I, 2014, 8421 : 357 - 371
[44] K-means Clustering Algorithm for Large-scale Chinese Commodity Information Web Based on Hadoop
Geng Yushui
Zhang Lishuo
14TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS, ENGINEERING AND SCIENCE (DCABES 2015), 2015, : 256 - 259
[45] A Semantic Partition Algorithm Based on Improved K-Means Clustering for Large-Scale Indoor Areas
Shi, Kegong
Yan, Jinjin
Yang, Jinquan
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2024, 13 (02)
[46] Optimal Operation of Large-scale Electric Vehicles Based on Improved K-means Clustering Algorithm
Liu, Jian
Xu, Weifeng
Liu, Zhijun
Fu, Guanhua
Jiang, Yunpeng
Zhao, Ergang
PROCEEDINGS OF 2022 5TH INTERNATIONAL CONFERENCE ON ROBOT SYSTEMS AND APPLICATIONS, ICRSA2022, 2022, : 23 - 28
[47] Efficient adaptive large-scale text clustering method based on genetic K-means algorithm
Dai, Wenhua
Jiao, Cuizhen
He, Tingting
RECENT ADVANCE OF CHINESE COMPUTING TECHNOLOGIES, 2007, : 281 - 285
[48] Extractive Text Summarization on Large-scale Dataset Using K-Means Clustering
Ti-Hon Nguyen
Thanh-Nghi Do
ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: THEORY AND PRACTICES IN ARTIFICIAL INTELLIGENCE, 2022, 13343 : 737 - 746
[49] Large-scale k-means clustering with user-centric privacy preservation
Sakuma, Jun
Kobayashi, Shigenobu
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2008, 5012 : 320 - 332
[50] Parallel clustering over large-scale data stream based on grid density using Hadoop MapReduce
Cai, Binlei
Zhu, Shiwei
Guo, Qin
Yu, Junfeng
ICIC Express Letters, 2013, 7 (11): : 3075 - 3081

← 1 2 3 4 5 →