Research and implementation of user clustering based on MapReduce in multimedia big data

被引:0
|
作者
Tongke Fan
机构
[1] Xi’an International University,School of Information and Network
来源
关键词
Multimedia big data; Cloud computing; Hadoop; MapReduce; Clustering algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
Poor understanding and low clustering efficiency of massive data is a problem under the context of big data. To solve this problem, Canopy + K-means clustering algorithm is proposed, and the MapReduce programming model is used to make full use of the computing and storage capacity of Hadoop cluster. Large quantities of buyers on taobao are taken as application context to do case study through Hadoop platform’s data mining set Mahout. General procedure for miming with Mahout is also given. Clustering algorithm based on MapReduce shows preferable clustering quality and operation speed. Comparison is made between Canopy + K-means algorithm and K-means algorithm in respect of runtime, speed-up ratio and extendibility. Test is conducted for these two clustering algorithms on clusters with different numbers of nodes in context of dataset of various scales. The experimental results show that Canopy + K-means algorithm has faster operation speed than K-means algorithm, but both of them show good speed-up ratio under Hadoop environment and Canopy + K-means algorithm is even much better K-means algorithm.
引用
收藏
页码:10017 / 10031
页数:14
相关论文
共 50 条
  • [1] Research and implementation of user clustering based on MapReduce in multimedia big data
    Fan, Tongke
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (08) : 10017 - 10031
  • [2] MapReduce Clustering for Big Data
    Ghattas, Badih
    Pinto, Antoine
    Diao, Sambou
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5116 - 5124
  • [3] MapReduce based Method for Big Data Semantic Clustering
    Yang, Jie
    Li, Xiaoping
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 2814 - 2819
  • [4] Big data clustering with varied density based on MapReduce
    Safanaz Heidari
    Mahmood Alborzi
    Reza Radfar
    Mohammad Ali Afsharkazemi
    Ali Rajabzadeh Ghatari
    [J]. Journal of Big Data, 6
  • [5] Big data clustering with varied density based on MapReduce
    Heidari, Safanaz
    Alborzi, Mahmood
    Radfar, Reza
    Afsharkazemi, Mohammad Ali
    Ghatari, Ali Rajabzadeh
    [J]. JOURNAL OF BIG DATA, 2019, 6 (01)
  • [6] Event Segmentation using MapReduce based Big Data Clustering
    Shafiq, M. Omair
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 1857 - 1866
  • [7] Clustering on Big Data Using Hadoop MapReduce
    Akthar, Nadeem
    Ahamad, Mohd Vasim
    Khan, Shahbaz
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 789 - 795
  • [8] Parallel Clustering Optimization Algorithm Based on MapReduce in Big Data Mining
    Zhang, Huajie
    Song, Lei
    Zhang, Sen
    [J]. IAENG International Journal of Applied Mathematics, 2023, 53 (01)
  • [9] MapReduce Research on Warehousing of Big Data
    Pticek, M.
    Vrdoljak, B.
    [J]. 2017 40TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2017, : 1361 - 1366
  • [10] A Big Graph Clustering Algorithm Based on MapReduce
    Leng, Yonglin
    Zhang, Qingchen
    [J]. MODERN TECHNOLOGIES IN MATERIALS, MECHANICS AND INTELLIGENT SYSTEMS, 2014, 1049 : 1467 - +