Research and implementation of user clustering based on MapReduce in multimedia big data

被引:0
|
作者
Tongke Fan
机构
[1] Xi’an International University,School of Information and Network
来源
关键词
Multimedia big data; Cloud computing; Hadoop; MapReduce; Clustering algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
Poor understanding and low clustering efficiency of massive data is a problem under the context of big data. To solve this problem, Canopy + K-means clustering algorithm is proposed, and the MapReduce programming model is used to make full use of the computing and storage capacity of Hadoop cluster. Large quantities of buyers on taobao are taken as application context to do case study through Hadoop platform’s data mining set Mahout. General procedure for miming with Mahout is also given. Clustering algorithm based on MapReduce shows preferable clustering quality and operation speed. Comparison is made between Canopy + K-means algorithm and K-means algorithm in respect of runtime, speed-up ratio and extendibility. Test is conducted for these two clustering algorithms on clusters with different numbers of nodes in context of dataset of various scales. The experimental results show that Canopy + K-means algorithm has faster operation speed than K-means algorithm, but both of them show good speed-up ratio under Hadoop environment and Canopy + K-means algorithm is even much better K-means algorithm.
引用
收藏
页码:10017 / 10031
页数:14
相关论文
共 50 条
  • [41] Atrak: a MapReduce-based data warehouse for big data
    Barkhordari, Mohammadhossein
    Niamanesh, Mahdi
    [J]. JOURNAL OF SUPERCOMPUTING, 2017, 73 (10): : 4596 - 4610
  • [42] Atrak: a MapReduce-based data warehouse for big data
    Mohammadhossein Barkhordari
    Mahdi Niamanesh
    [J]. The Journal of Supercomputing, 2017, 73 : 4596 - 4610
  • [43] Research on MapReduce-based fuzzy associative classifier for big probabilistic numerical data
    Pei, Bin
    Wang, Fenmei
    Wang, Xiuzhen
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON INTERNET OF THINGS (ITHINGS) AND IEEE GREEN COMPUTING AND COMMUNICATIONS (GREENCOM) AND IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING (CPSCOM) AND IEEE SMART DATA (SMARTDATA), 2016, : 903 - 906
  • [44] Parallel Fuzzy C-Means Clustering Based Big Data Anonymization Using Hadoop MapReduce
    Lawrance, Josephine Usha
    Jesudhasan, Jesu Vedha Nayahi
    Rittammal, Jerald Beno Thampiraj
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2024, 135 (04) : 2103 - 2130
  • [45] Parallel Processing of Big Data using Power Iteration Clustering over MapReduce
    Jayalatchumy, D.
    Thambidurai, P.
    Alamelu, A. Vasumathi
    [J]. 2014 WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT 2014), 2014, : 176 - 178
  • [46] Research and Implementation of Big Data Preprocessing System Based on Hadoop
    Dai, Huadong
    Zhang, Shu
    Wang, Li
    Ding, Yan
    [J]. PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2016, : 90 - 94
  • [47] Research of parallel DBSCAN clustering algorithm based on MapReduce
    [J]. Fu, X. (xffu@gdut.edu.cn), 1600, Science and Engineering Research Support Society (07):
  • [48] Research on Vague soft clustering algorithm based on MapReduce
    Wang, Wei
    Wu, Junsheng
    Zhu, Zhixiang
    [J]. PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE & APPLICATION TECHNOLOGY (ICCIA 2017), 2017, 74 : 329 - 338
  • [49] A MapReduce-Based ELM for Regression in Big Data
    Wu, B.
    Yan, T. H.
    Xu, X. S.
    He, B.
    Li, W. H.
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 164 - 173
  • [50] Research and Implementation of Distributed Storage System Based on Big Data
    Ma, Ke
    [J]. PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2016, : 168 - 171