SMK-means: An Improved Mini Batch K-means Algorithm Based on Mapreduce with Big Data

被引:32
|
作者
Xiao, Bo [1 ]
Wang, Zhen [2 ]
Liu, Qi [3 ]
Liu, Xiaodong [3 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Jiangsu Key Lab Atmospher Environm Monitoring & P, Jiangsu Collaborat Innovat Ctr Atmospher Environm, Sch Environm Sci & Engn, 219 Ningliu Rd, Nanjing 210044, Jiangsu, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Sch Comp & Software, 219 Ningliu Rd, Nanjing 210044, Jiangsu, Peoples R China
[3] Edinburgh Napier Univ, Sch Comp, 10 Colinton Rd, Edinburgh EH10 5DT, Midlothian, Scotland
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2018年 / 56卷 / 03期
基金
中国国家社会科学基金;
关键词
Big data; outlier detection; SMK-means; Mini Batch K-means; simulated annealing; NETWORK;
D O I
10.3970/cmc.2018.01830
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, the rapid development of big data technology has also been favored by more and more scholars. Massive data storage and calculation problems have also been solved. At the same time, outlier detection problems in mass data have also come along with it. Therefore, more research work has been devoted to the problem of outlier detection in big data. However, the existing available methods have high computation time, the improved algorithm of outlier detection is presented, which has higher performance to detect outlier. In this paper, an improved algorithm is proposed. The SMK-means is a fusion algorithm which is achieved by Mini Batch K-means based on simulated annealing algorithm for anomalous detection of massive household electricity data, which can give the number of clusters and reduce the number of iterations and improve the accuracy of clustering. In this paper, several experiments are performed to compare and analyze multiple performances of the algorithm. Through analysis, we know that the proposed algorithm is superior to the existing algorithms.
引用
收藏
页码:365 / 379
页数:15
相关论文
共 50 条
  • [1] An improved K-means algorithm for big data
    Moodi, Fatemeh
    Saadatfar, Hamid
    [J]. IET SOFTWARE, 2022, 16 (01) : 48 - 59
  • [2] An Improved K-means Algorithm based on Mapreduce and Grid
    Ma, Li
    Gu, Lei
    Li, Bo
    Ma, Yue
    Wang, Jin
    [J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2015, 8 (01): : 189 - 199
  • [3] An Improved Sampling K-means Clustering Algorithm Based on MapReduce
    Zhang Ya-ling
    Wang Ya-nan
    [J]. 2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017,
  • [4] An Improved Differential Privacy K-means Algorithm Based on MapReduce
    Yao, Shunyuan
    [J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2018, : 141 - 145
  • [5] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
    Li, Yongyi
    Yang, Zhongqiang
    Han, Kaixu
    [J]. Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
  • [6] An Improved parallel K-means Clustering Algorithm with MapReduce
    Liao, Qing
    Yang, Fan
    Zhao, Jingming
    [J]. 2013 15TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT), 2013, : 764 - 768
  • [7] Improved MapReduce k-Means Clustering Algorithm with Combiner
    Anchalia, Prajesh P.
    [J]. 2014 UKSIM-AMSS 16TH INTERNATIONAL CONFERENCE ON COMPUTER MODELLING AND SIMULATION (UKSIM), 2014, : 386 - 391
  • [8] Order Batch Optimization Based on Improved K-Means Algorithm
    Zu, Qiaohong
    Feng, Rui
    [J]. HUMAN CENTERED COMPUTING, 2019, 11956 : 700 - 705
  • [9] The Application of Big Data Mining Prediction Based on Improved K-Means Algorithm
    Qiao, Yuchen
    Li, Yunlu
    Lv, Xiaotian
    [J]. 2019 34RD YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION (YAC), 2019, : 348 - 351
  • [10] Optimized big data K-means clustering using MapReduce
    Cui, Xiaoli
    Zhu, Pingfei
    Yang, Xin
    Li, Keqiu
    Ji, Changqing
    [J]. JOURNAL OF SUPERCOMPUTING, 2014, 70 (03): : 1249 - 1259