Design and implementation of K-means parallel algorithm based on Hadoop

被引:0
|
作者
Jia, Jiyang [1 ]
Xie, Hui [1 ]
Xu, Tao [1 ]
机构
[1] Zaozhuang Univ, Coll Informat Sci & Engn, Zaozhuang, Shandong, Peoples R China
关键词
K-means; Hadoop; MapReduce framework; Nearest neighbor degree;
D O I
10.1145/3469213.3470413
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Aiming at the problem of low efficiency and instability of K-means clustering in big data environment, a parallel k-means algorithm based on Hadoop is proposed. Determine the initial number of clusters of K-means clustering through the elbow method, and then determine the initial cluster center based on the ideas of density and proximity, and use the MapReduce framework of the Hadoop ecosystem to achieve parallelization. Experiments show that the algorithm can improve the efficiency and convergence of K-means clustering in the case of massive data.
引用
收藏
页数:4
相关论文
共 50 条
  • [21] Implementation of K-means Algorithm on FGGA
    Altuncu, Mehmet Ali
    Turkoglu, Bahadir
    Cavuslu, Mehmet Ali
    Sahin, Suhap
    [J]. 2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [22] Analysis and Research of K-means Algorithm in Soil Fertility Based on Hadoop Platform
    Chen, Guifen
    Yang, Yuqin
    Guo, Hongliang
    Sun, Xionghui
    Chen, Hang
    Cai, Lixia
    [J]. Computer and Computing Technologies in Agriculture VIII, 2015, 452 : 304 - 312
  • [23] Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework
    Lu, Weijia
    [J]. JOURNAL OF GRID COMPUTING, 2020, 18 (02) : 239 - 250
  • [24] Cloud Computing K-Means Text Clustering Filtering Algorithm based on Hadoop
    Huang Suyu
    [J]. Proceedings of the 2016 4th International Conference on Machinery, Materials and Information Technology Applications, 2016, 71 : 1516 - 1521
  • [25] Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework
    Weijia Lu
    [J]. Journal of Grid Computing, 2020, 18 : 239 - 250
  • [26] Genetic Algorithm Based Parallel K-Means Data Clustering Algorithm Using MapReduce Programming Paradigm on Hadoop Environment (GAPKCA)
    Alshammari, Sayer
    Zolkepli, Maslina Binti
    Abdullah, Rusli Bin
    [J]. RECENT ADVANCES ON SOFT COMPUTING AND DATA MINING (SCDM 2020), 2020, 978 : 98 - 108
  • [27] A Parallel Genetic K-Means Algorithm based on the Island Model
    Wang, Xikang
    Wang, Tongxi
    Xiang, Hua
    Huang, Lan
    [J]. ENGINEERING LETTERS, 2024, 32 (08) : 1632 - 1643
  • [28] CUDA-based parallel K-means clustering algorithm
    Huo, Yingqiu
    Qin, Renbo
    Xing, Caiyan
    Chen, Xi
    Fang, Yong
    [J]. Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2014, 45 (11): : 47 - 53
  • [29] Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering
    Ansari Z.
    Afzal A.
    Sardar T.H.
    [J]. Journal of The Institution of Engineers (India): Series B, 2019, 100 (02) : 95 - 103
  • [30] Accelerate K-means Algorithm by Using GPU in the Hadoop Framework
    Zheng, HuanXin
    Wu, JunMin
    [J]. WEB-AGE INFORMATION MANAGEMENT: WAIM 2014 INTERNATIONAL WORKSHOPS, 2014, 8597 : 177 - 186