K-means Clustering Algorithm for Large-scale Chinese Commodity Information Web Based on Hadoop

被引:2
|
作者
Geng Yushui [1 ]
Zhang Lishuo [1 ]
机构
[1] Qilu Univ Technol, Sch Informat, Jinan 250353, Peoples R China
关键词
K-Means clustering algorithm; Hadoop platform; MapReduce; Cloud computing; Big Data;
D O I
10.1109/DCABES.2015.71
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
With the growing popularity of the network, product information filled in the many pages of the Internet, which you want to get the information you need on these pages tend to consider clustering information, and the current explosive growth of data so that the information mass storage condition occurs, clustering to facing the problems such as large calculation complexity and time consuming, then the traditional K-Means clustering algorithm does not meet the needs of large data environments today, so this article combined with the advantages of the Hadoop platform and MapReduce programming model is proposed the K-Means clustering algorithm for large-scale chinese commodity information Web based on Hadoop. Map function calculates the distance from the cluster center for each sample and mark to their category, Reduce function intermediate results are summarized and calculated new clustering center for the next round of iteration. Experimental results show that this method can better improve the clustering processing speed.
引用
收藏
页码:256 / 259
页数:4
相关论文
共 50 条
  • [1] The Application of K-Means Clustering Algorithm Based on Hadoop
    Zhong, Yurong
    Liu, Dan
    [J]. PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2016), 2016, : 88 - 92
  • [2] Scalable k-means for large-scale clustering
    Ming, Yuewei
    Zhu, En
    Wang, Mao
    Liu, Qiang
    Liu, Xinwang
    Yin, Jianping
    [J]. INTELLIGENT DATA ANALYSIS, 2019, 23 (04) : 825 - 838
  • [3] Compressed K-Means for Large-Scale Clustering
    Shen, Xiaobo
    Liu, Weiwei
    Tsang, Ivor
    Shen, Fumin
    Sun, Quan-Sen
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2527 - 2533
  • [4] Optimization of K-means Clustering Algorithm Based on Hadoop Platform
    Duan, A. L.
    Xu, Z. X.
    Zhang, H. J.
    [J]. INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENVIRONMENTAL ENGINEERING (CSEE 2015), 2015, : 1195 - 1203
  • [5] An Improved K-means Clustering Algorithm Based on Hadoop Platform
    Hou, Xiangru
    [J]. CYBER SECURITY INTELLIGENCE AND ANALYTICS, 2020, 928 : 1101 - 1109
  • [6] A Semantic Partition Algorithm Based on Improved K-Means Clustering for Large-Scale Indoor Areas
    Shi, Kegong
    Yan, Jinjin
    Yang, Jinquan
    [J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2024, 13 (02)
  • [7] Optimal Operation of Large-scale Electric Vehicles Based on Improved K-means Clustering Algorithm
    Liu, Jian
    Xu, Weifeng
    Liu, Zhijun
    Fu, Guanhua
    Jiang, Yunpeng
    Zhao, Ergang
    [J]. PROCEEDINGS OF 2022 5TH INTERNATIONAL CONFERENCE ON ROBOT SYSTEMS AND APPLICATIONS, ICRSA2022, 2022, : 23 - 28
  • [8] Efficient adaptive large-scale text clustering method based on genetic K-means algorithm
    Dai, Wenhua
    Jiao, Cuizhen
    He, Tingting
    [J]. RECENT ADVANCE OF CHINESE COMPUTING TECHNOLOGIES, 2007, : 281 - 285
  • [9] Chinese text clustering algorithm based k-means
    Yao, Mingyu
    Pi, Dechang
    Cong, Xiangxiang
    [J]. 2012 INTERNATIONAL CONFERENCE ON MEDICAL PHYSICS AND BIOMEDICAL ENGINEERING (ICMPBE2012), 2012, 33 : 301 - 307
  • [10] Chinese Text Clustering Algorithm Based K-Means
    Yao, Mingyu
    Pi, Dechang
    Cong, Xiangxiang
    [J]. 2011 AASRI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRY APPLICATION (AASRI-AIIA 2011), VOL 1, 2011, : 90 - 93