K-means Clustering Algorithm for Large-scale Chinese Commodity Information Web Based on Hadoop

被引:2
|
作者
Geng Yushui [1 ]
Zhang Lishuo [1 ]
机构
[1] Qilu Univ Technol, Sch Informat, Jinan 250353, Peoples R China
关键词
K-Means clustering algorithm; Hadoop platform; MapReduce; Cloud computing; Big Data;
D O I
10.1109/DCABES.2015.71
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
With the growing popularity of the network, product information filled in the many pages of the Internet, which you want to get the information you need on these pages tend to consider clustering information, and the current explosive growth of data so that the information mass storage condition occurs, clustering to facing the problems such as large calculation complexity and time consuming, then the traditional K-Means clustering algorithm does not meet the needs of large data environments today, so this article combined with the advantages of the Hadoop platform and MapReduce programming model is proposed the K-Means clustering algorithm for large-scale chinese commodity information Web based on Hadoop. Map function calculates the distance from the cluster center for each sample and mark to their category, Reduce function intermediate results are summarized and calculated new clustering center for the next round of iteration. Experimental results show that this method can better improve the clustering processing speed.
引用
收藏
页码:256 / 259
页数:4
相关论文
共 50 条
  • [31] Large scale K-means clustering using GPUs
    Li, Mi
    Frank, Eibe
    Pfahringer, Bernhard
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 37 (01) : 67 - 109
  • [32] A Clustering Method Based on K-Means Algorithm
    Li, Youguo
    Wu, Haiyan
    [J]. INTERNATIONAL CONFERENCE ON SOLID STATE DEVICES AND MATERIALS SCIENCE, 2012, 25 : 1104 - 1109
  • [33] A Fuzzy Clustering Algorithm Based on K-means
    Yan, Zhen
    Pi, Dechang
    [J]. ECBI: 2009 INTERNATIONAL CONFERENCE ON ELECTRONIC COMMERCE AND BUSINESS INTELLIGENCE, PROCEEDINGS, 2009, : 523 - 528
  • [34] K-means algorithm based on particle swarm optimization for web document clustering
    Xiao, L. Z.
    Shao, Z. Q.
    Gu, X. M.
    [J]. DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2006, 13E : 980 - 984
  • [35] MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce
    Akthar, Nadeem
    Ahamad, Mohd Vasim
    Ahmad, Shahbaaz
    [J]. 2016 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2016, : 192 - 198
  • [36] An Improved Hierarchical K-Means Algorithm for Web Document Clustering
    Liu, Yongxin
    Liu, Zhijng
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 606 - 610
  • [37] A Kind of Hierarchical K-means Web Log Clustering Algorithm
    Liu Li Xia
    Zhuang Yi Qi
    [J]. ADVANCED MEASUREMENT AND TEST, PARTS 1 AND 2, 2010, 439-440 : 481 - 485
  • [38] Design and implementation of K-means parallel algorithm based on Hadoop
    Jia, Jiyang
    Xie, Hui
    Xu, Tao
    [J]. PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,
  • [39] Energy efficient grid based k-means clustering algorithm for large scale wireless sensor networks
    Ben Gouissem, Bechir
    Gantassi, Rahma
    Hasnaoui, Salem
    [J]. INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2022, 35 (14)
  • [40] Consensus model based on probability K-means clustering algorithm for large scale group decision making
    Liu, Qian
    Wu, Hangyao
    Xu, Zeshui
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (06) : 1609 - 1626