Design and implementation of K-means parallel algorithm based on Hadoop

被引：0

作者：

Jia, Jiyang ^{[1
]}

Xie, Hui ^{[1
]}

Xu, Tao ^{[1
]}

机构：

[1] Zaozhuang Univ, Coll Informat Sci & Engn, Zaozhuang, Shandong, Peoples R China

来源：

PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21) | 2021年

关键词：

K-means; Hadoop; MapReduce framework; Nearest neighbor degree;

D O I：

10.1145/3469213.3470413

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Aiming at the problem of low efficiency and instability of K-means clustering in big data environment, a parallel k-means algorithm based on Hadoop is proposed. Determine the initial number of clusters of K-means clustering through the elbow method, and then determine the initial cluster center based on the ideas of density and proximity, and use the MapReduce framework of the Hadoop ecosystem to achieve parallelization. Experiments show that the algorithm can improve the efficiency and convergence of K-means clustering in the case of massive data.

引用

页数：4

共 50 条

[1] Implementation of hadoop optimization K-means parallel clustering algorithm
Huang, Suyu
Tan, Lingli
[J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 125 : 160 - 160
[2] Parallel Implementation of K-Means Algorithm on FPGA
Dias, Leonardo A.
Ferreira, Joao C.
Fernandes, Marcelo A. C.
[J]. IEEE ACCESS, 2020, 8 (08): : 41071 - 41084
[3] Research and Improve on K-means Algorithm Based on Hadoop
Wu, Kehe
Zeng, Wenjing
Wu, Tingting
An, Yanwen
[J]. PROCEEDINGS OF 2015 6TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE, 2015, : 334 - 337
[4] The Application of K-Means Clustering Algorithm Based on Hadoop
Zhong, Yurong
Liu, Dan
[J]. PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2016), 2016, : 88 - 92
[5] An Improved Parallelization of K-means Algorithm based on HADOOP
Guo, Yizhuo
[J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
[6] Research on Improved K-Means Algorithm Based on Hadoop
Wei Xiaojing
Li Yuanbo
[J]. 2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, : 593 - 598
[7] Research on K-Means clustering algorithm based on HADOOP
[J]. Hu, Feng (272800588@qq.com), 1600, Science and Engineering Research Support Society (09):
[8] Design and Implementation of Network Advertising Precise Marketing System Based on Parallel K-Means Algorithm
Liu Jing
[J]. PROCEEDINGS OF 2014 IEEE WORKSHOP ON ADVANCED RESEARCH AND TECHNOLOGY IN INDUSTRY APPLICATIONS (WARTIA), 2014, : 122 - 124
[9] Enhanced Parallel Implementation of the K-Means Clustering Algorithm
Baydoun, Mohammed
Dawi, Mohammad
Ghaziri, Hassan
[J]. 2016 3RD INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTATIONAL TOOLS FOR ENGINEERING APPLICATIONS (ACTEA), 2016, : 7 - 11
[10] Optimization of K-means Clustering Algorithm Based on Hadoop Platform
Duan, A. L.
Xu, Z. X.
Zhang, H. J.
[J]. INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENVIRONMENTAL ENGINEERING (CSEE 2015), 2015, : 1195 - 1203

← 1 2 3 4 5 →