SMK-means: An Improved Mini Batch K-means Algorithm Based on Mapreduce with Big Data

被引：32

作者：

Xiao, Bo ^{[1
]}

Wang, Zhen ^{[2
]}

Liu, Qi ^{[3
]}

Liu, Xiaodong ^{[3
]}

机构：

[1] Nanjing Univ Informat Sci & Technol, Jiangsu Key Lab Atmospher Environm Monitoring & P, Jiangsu Collaborat Innovat Ctr Atmospher Environm, Sch Environm Sci & Engn, 219 Ningliu Rd, Nanjing 210044, Jiangsu, Peoples R China

[2] Nanjing Univ Informat Sci & Technol, Sch Comp & Software, 219 Ningliu Rd, Nanjing 210044, Jiangsu, Peoples R China

[3] Edinburgh Napier Univ, Sch Comp, 10 Colinton Rd, Edinburgh EH10 5DT, Midlothian, Scotland

来源：

CMC-COMPUTERS MATERIALS & CONTINUA | 2018年 / 56卷 / 03期

基金：

中国国家社会科学基金;

关键词：

Big data; outlier detection; SMK-means; Mini Batch K-means; simulated annealing; NETWORK;

D O I：

10.3970/cmc.2018.01830

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, the rapid development of big data technology has also been favored by more and more scholars. Massive data storage and calculation problems have also been solved. At the same time, outlier detection problems in mass data have also come along with it. Therefore, more research work has been devoted to the problem of outlier detection in big data. However, the existing available methods have high computation time, the improved algorithm of outlier detection is presented, which has higher performance to detect outlier. In this paper, an improved algorithm is proposed. The SMK-means is a fusion algorithm which is achieved by Mini Batch K-means based on simulated annealing algorithm for anomalous detection of massive household electricity data, which can give the number of clusters and reduce the number of iterations and improve the accuracy of clustering. In this paper, several experiments are performed to compare and analyze multiple performances of the algorithm. Through analysis, we know that the proposed algorithm is superior to the existing algorithms.

引用

页码：365 / 379

页数：15

共 50 条

[1] An improved K-means algorithm for big data
Moodi, Fatemeh
Saadatfar, Hamid
[J]. IET SOFTWARE, 2022, 16 (01) : 48 - 59
[2] An Improved K-means Algorithm based on Mapreduce and Grid
Ma, Li
Gu, Lei
Li, Bo
Ma, Yue
Wang, Jin
[J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2015, 8 (01): : 189 - 199
[3] An Improved Sampling K-means Clustering Algorithm Based on MapReduce
Zhang Ya-ling
Wang Ya-nan
[J]. 2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017,
[4] An Improved Differential Privacy K-means Algorithm Based on MapReduce
Yao, Shunyuan
[J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2018, : 141 - 145
[5] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
Li, Yongyi
Yang, Zhongqiang
Han, Kaixu
[J]. Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
[6] An Improved parallel K-means Clustering Algorithm with MapReduce
Liao, Qing
Yang, Fan
Zhao, Jingming
[J]. 2013 15TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT), 2013, : 764 - 768
[7] Improved MapReduce k-Means Clustering Algorithm with Combiner
Anchalia, Prajesh P.
[J]. 2014 UKSIM-AMSS 16TH INTERNATIONAL CONFERENCE ON COMPUTER MODELLING AND SIMULATION (UKSIM), 2014, : 386 - 391
[8] Order Batch Optimization Based on Improved K-Means Algorithm
Zu, Qiaohong
Feng, Rui
[J]. HUMAN CENTERED COMPUTING, 2019, 11956 : 700 - 705
[9] The Application of Big Data Mining Prediction Based on Improved K-Means Algorithm
Qiao, Yuchen
Li, Yunlu
Lv, Xiaotian
[J]. 2019 34RD YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION (YAC), 2019, : 348 - 351
[10] Optimized big data K-means clustering using MapReduce
Cui, Xiaoli
Zhu, Pingfei
Yang, Xin
Li, Keqiu
Ji, Changqing
[J]. JOURNAL OF SUPERCOMPUTING, 2014, 70 (03): : 1249 - 1259

← 1 2 3 4 5 →