Design and implementation of K-means parallel algorithm based on Hadoop

被引:0
|
作者
Jia, Jiyang [1 ]
Xie, Hui [1 ]
Xu, Tao [1 ]
机构
[1] Zaozhuang Univ, Coll Informat Sci & Engn, Zaozhuang, Shandong, Peoples R China
关键词
K-means; Hadoop; MapReduce framework; Nearest neighbor degree;
D O I
10.1145/3469213.3470413
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Aiming at the problem of low efficiency and instability of K-means clustering in big data environment, a parallel k-means algorithm based on Hadoop is proposed. Determine the initial number of clusters of K-means clustering through the elbow method, and then determine the initial cluster center based on the ideas of density and proximity, and use the MapReduce framework of the Hadoop ecosystem to achieve parallelization. Experiments show that the algorithm can improve the efficiency and convergence of K-means clustering in the case of massive data.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] Implementation of hadoop optimization K-means parallel clustering algorithm
    Huang, Suyu
    Tan, Lingli
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 125 : 160 - 160
  • [2] Parallel Implementation of K-Means Algorithm on FPGA
    Dias, Leonardo A.
    Ferreira, Joao C.
    Fernandes, Marcelo A. C.
    [J]. IEEE ACCESS, 2020, 8 (08): : 41071 - 41084
  • [3] Research and Improve on K-means Algorithm Based on Hadoop
    Wu, Kehe
    Zeng, Wenjing
    Wu, Tingting
    An, Yanwen
    [J]. PROCEEDINGS OF 2015 6TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE, 2015, : 334 - 337
  • [4] The Application of K-Means Clustering Algorithm Based on Hadoop
    Zhong, Yurong
    Liu, Dan
    [J]. PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2016), 2016, : 88 - 92
  • [5] An Improved Parallelization of K-means Algorithm based on HADOOP
    Guo, Yizhuo
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [6] Research on Improved K-Means Algorithm Based on Hadoop
    Wei Xiaojing
    Li Yuanbo
    [J]. 2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, : 593 - 598
  • [7] Research on K-Means clustering algorithm based on HADOOP
    [J]. Hu, Feng (272800588@qq.com), 1600, Science and Engineering Research Support Society (09):
  • [8] Design and Implementation of Network Advertising Precise Marketing System Based on Parallel K-Means Algorithm
    Liu Jing
    [J]. PROCEEDINGS OF 2014 IEEE WORKSHOP ON ADVANCED RESEARCH AND TECHNOLOGY IN INDUSTRY APPLICATIONS (WARTIA), 2014, : 122 - 124
  • [9] Enhanced Parallel Implementation of the K-Means Clustering Algorithm
    Baydoun, Mohammed
    Dawi, Mohammad
    Ghaziri, Hassan
    [J]. 2016 3RD INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTATIONAL TOOLS FOR ENGINEERING APPLICATIONS (ACTEA), 2016, : 7 - 11
  • [10] Optimization of K-means Clustering Algorithm Based on Hadoop Platform
    Duan, A. L.
    Xu, Z. X.
    Zhang, H. J.
    [J]. INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENVIRONMENTAL ENGINEERING (CSEE 2015), 2015, : 1195 - 1203