Research and Improve on K-means Algorithm Based on Hadoop

被引:0
|
作者
Wu, Kehe [1 ]
Zeng, Wenjing [1 ]
Wu, Tingting [1 ]
An, Yanwen [1 ]
机构
[1] North China Elect Power Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
关键词
Data Mining; K-means; MapReduce; Hadoop;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
With the advent of the big data era, traditional data mining algorithm becomes incompetent for the task of massive data analysis, management and mining. The development of cloud computing brings new life to algorithm parallelization. In this paper, we have studied the K-means algorithm, one of the clustering algorithm. Then we attempt to improves this algorithm via the method that sample the large-scale data and use convex hull and opposite Chung points to solve the initial two cluster centers. We also take the MapReduce programming model to parallelize the whole process. Finally, using the Reuters news set 21578 as a data source, comparative experiments under different distance measure, serial to parallel, and different cluster nodes have been done to verify the efficiency of the improved algorithm. Results show that compared with serial algorithm, the improved parallel algorithm improves obviously both in reliability and efficiency with the increase of cluster nodes and data size.
引用
收藏
页码:334 / 337
页数:4
相关论文
共 50 条
  • [1] Research on Improved K-Means Algorithm Based on Hadoop
    Wei Xiaojing
    Li Yuanbo
    [J]. 2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, : 593 - 598
  • [2] Research on K-Means clustering algorithm based on HADOOP
    [J]. Hu, Feng (272800588@qq.com), 1600, Science and Engineering Research Support Society (09):
  • [3] The Application of K-Means Clustering Algorithm Based on Hadoop
    Zhong, Yurong
    Liu, Dan
    [J]. PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2016), 2016, : 88 - 92
  • [4] An Improved Parallelization of K-means Algorithm based on HADOOP
    Guo, Yizhuo
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [5] Analysis and Research of K-means Algorithm in Soil Fertility Based on Hadoop Platform
    Chen, Guifen
    Yang, Yuqin
    Guo, Hongliang
    Sun, Xionghui
    Chen, Hang
    Cai, Lixia
    [J]. Computer and Computing Technologies in Agriculture VIII, 2015, 452 : 304 - 312
  • [6] Optimization of K-means Clustering Algorithm Based on Hadoop Platform
    Duan, A. L.
    Xu, Z. X.
    Zhang, H. J.
    [J]. INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENVIRONMENTAL ENGINEERING (CSEE 2015), 2015, : 1195 - 1203
  • [7] Design and implementation of K-means parallel algorithm based on Hadoop
    Jia, Jiyang
    Xie, Hui
    Xu, Tao
    [J]. PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,
  • [8] An Improved K-means Clustering Algorithm Based on Hadoop Platform
    Hou, Xiangru
    [J]. CYBER SECURITY INTELLIGENCE AND ANALYTICS, 2020, 928 : 1101 - 1109
  • [9] Research on Fixed Traffic Bottleneck of K-means Clustering based on Hadoop
    Gao, Weiwei
    Li, Xiaofeng
    Li, Dong
    [J]. PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 351 - 354
  • [10] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
    Shi Na
    Liu Xumin
    Guan Yong
    [J]. 2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67