Research and implementation of user clustering based on MapReduce in multimedia big data

被引:0
|
作者
Tongke Fan
机构
[1] Xi’an International University,School of Information and Network
来源
关键词
Multimedia big data; Cloud computing; Hadoop; MapReduce; Clustering algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
Poor understanding and low clustering efficiency of massive data is a problem under the context of big data. To solve this problem, Canopy + K-means clustering algorithm is proposed, and the MapReduce programming model is used to make full use of the computing and storage capacity of Hadoop cluster. Large quantities of buyers on taobao are taken as application context to do case study through Hadoop platform’s data mining set Mahout. General procedure for miming with Mahout is also given. Clustering algorithm based on MapReduce shows preferable clustering quality and operation speed. Comparison is made between Canopy + K-means algorithm and K-means algorithm in respect of runtime, speed-up ratio and extendibility. Test is conducted for these two clustering algorithms on clusters with different numbers of nodes in context of dataset of various scales. The experimental results show that Canopy + K-means algorithm has faster operation speed than K-means algorithm, but both of them show good speed-up ratio under Hadoop environment and Canopy + K-means algorithm is even much better K-means algorithm.
引用
收藏
页码:10017 / 10031
页数:14
相关论文
共 50 条
  • [21] User online behavior based on big data distributed clustering algorithm
    Wang, Yan
    [J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2020, 17 (02):
  • [22] A MapReduce Cortical Algorithms Implementation for Unsupervised Learning of Big Data
    Hajj, Nadine
    Rizk, Yara
    Awad, Mariette
    [J]. INNS CONFERENCE ON BIG DATA 2015 PROGRAM, 2015, 53 : 327 - 334
  • [23] EMR: Scalable Clustering of Big HR Data using Evolutionary MapReduce
    Bohlouli, Mahdi
    He, Zhonghua
    [J]. WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, : 26 - 34
  • [24] Optimized big data K-means clustering using MapReduce
    Cui, Xiaoli
    Zhu, Pingfei
    Yang, Xin
    Li, Keqiu
    Ji, Changqing
    [J]. JOURNAL OF SUPERCOMPUTING, 2014, 70 (03): : 1249 - 1259
  • [25] Big Data Analytics based on PANFIS MapReduce
    Za'in, Choiru
    Pratama, Mahardhika
    Lughofer, Edwin
    Ferdaus, Meftahul
    Cai, Qing
    Prasad, Mukesh
    [J]. INNS CONFERENCE ON BIG DATA AND DEEP LEARNING, 2018, 144 : 140 - 152
  • [26] Student Psychology based optimized routing algorithm for big data clustering in IoT with MapReduce framework
    Shanmugam, Gowri
    Thanarajan, Tamilvizhi
    Rajendran, Surendran
    Murugaraj, Sadish Sendil
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2051 - 2063
  • [27] Density-based Algorithms for Big Data Clustering Using MapReduce Framework: A Comprehensive Study
    Khader, Mariam
    Al-Naymat, Ghazi
    [J]. ACM COMPUTING SURVEYS, 2020, 53 (05)
  • [28] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
    Li, Yongyi
    Yang, Zhongqiang
    Han, Kaixu
    [J]. Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
  • [29] Distributed Big Data Clustering using MapReduce-based Fuzzy C-Medoids
    Sardar T.H.
    Ansari Z.
    [J]. Journal of The Institution of Engineers (India): Series B, 2022, 103 (01): : 73 - 82
  • [30] Optimized big data K-means clustering using MapReduce
    Xiaoli Cui
    Pingfei Zhu
    Xin Yang
    Keqiu Li
    Changqing Ji
    [J]. The Journal of Supercomputing, 2014, 70 : 1249 - 1259