Research and implementation of user clustering based on MapReduce in multimedia big data

被引:0
|
作者
Tongke Fan
机构
[1] Xi’an International University,School of Information and Network
来源
关键词
Multimedia big data; Cloud computing; Hadoop; MapReduce; Clustering algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
Poor understanding and low clustering efficiency of massive data is a problem under the context of big data. To solve this problem, Canopy + K-means clustering algorithm is proposed, and the MapReduce programming model is used to make full use of the computing and storage capacity of Hadoop cluster. Large quantities of buyers on taobao are taken as application context to do case study through Hadoop platform’s data mining set Mahout. General procedure for miming with Mahout is also given. Clustering algorithm based on MapReduce shows preferable clustering quality and operation speed. Comparison is made between Canopy + K-means algorithm and K-means algorithm in respect of runtime, speed-up ratio and extendibility. Test is conducted for these two clustering algorithms on clusters with different numbers of nodes in context of dataset of various scales. The experimental results show that Canopy + K-means algorithm has faster operation speed than K-means algorithm, but both of them show good speed-up ratio under Hadoop environment and Canopy + K-means algorithm is even much better K-means algorithm.
引用
收藏
页码:10017 / 10031
页数:14
相关论文
共 50 条
  • [31] Hierarchical PSO Clustering on MapReduce for Scalable Privacy Preservation in Big Data
    Wai, Ei Nyein Chan
    Tsai, Pei-Wei
    Pan, Jeng-Shyang
    [J]. GENETIC AND EVOLUTIONARY COMPUTING, 2017, 536 : 36 - 44
  • [32] Research and Implementation of Efficient Parallel Processing of Big Data at TELBE User Facility
    Bawatna, Mohammed
    Green, Bertram
    Kovalev, Sergey
    Deinert, Jan-Christoph
    Knodel, Oliver
    Spallek, Rainer G.
    [J]. PROCEEDINGS OF THE 2019 SUMMER SIMULATION CONFERENCE (SUMMERSIM '19), 2019,
  • [33] Research and Implementation of Efficient Parallel Processing of Big Data at TELBE User Facility
    Bawatna, Mohammed
    Green, Bertram
    Kovalev, Sergey
    Deinert, Jan-Christoph
    Knodel, Oliver
    Spallek, Rainer G.
    [J]. 2019 INTERNATIONAL SYMPOSIUM ON PERFORMANCE EVALUATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (SPECTS), 2019,
  • [34] Efficient MapReduce Kernel k-Means for Big Data Clustering
    Tsapanos, Nikolaos
    Tefas, Anastasios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    [J]. 9TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2016), 2016,
  • [35] Utilizing the Buckshot Algorithm for Efficient Big Data Clustering in the MapReduce Model
    Gerakidis, Sergios
    Mamalis, Basilis
    [J]. PROCEEDINGS OF THE 23RD PAN-HELLENIC CONFERENCE OF INFORMATICS (PCI 2019), 2019, : 112 - 117
  • [36] Research on Mobile Terminal User Identification Based on Big Data
    Wang, Zhenhua
    Yu, Yangsen
    Xue, Tao
    [J]. PROCEEDINGS OF THE 3RD WORKSHOP ON ADVANCED RESEARCH AND TECHNOLOGY IN INDUSTRY (WARTIA 2017), 2017, 148 : 210 - 213
  • [37] Web-based Multimedia Research and Indexation for Big Data Databases
    Belarbi, Mohammed Amin
    Mahmoudi, Said
    Belalem, Ghalem
    Mahmoudi, Sidi Ahmed
    [J]. PROCEEDINGS OF 2017 3RD INTERNATIONAL CONFERENCE OF CLOUD COMPUTING TECHNOLOGIES AND APPLICATIONS (CLOUDTECH), 2017, : 159 - 165
  • [38] Research on Computer Vision Image Multimedia Technology Based on Big Data
    Chang, Runze
    [J]. PROCEEDINGS OF 2020 IEEE 2ND INTERNATIONAL CONFERENCE ON CIVIL AVIATION SAFETY AND INFORMATION TECHNOLOGY (ICCASIT), 2020, : 424 - 427
  • [39] Novel Metaknowledge-based Processing Technique for Multimedia Big Data clustering challenges
    Bari, Nima
    Vichr, Roman
    Kowsari, Kamran
    Berkovich, Simon Y.
    [J]. 2015 1ST IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2015, : 204 - 207
  • [40] Research on MapReduce-based fuzzy associative classifier for big probabilistic numerical data
    Pei, Bin
    Wang, Fenmei
    Wang, Xiuzhen
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON INTERNET OF THINGS (ITHINGS) AND IEEE GREEN COMPUTING AND COMMUNICATIONS (GREENCOM) AND IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING (CPSCOM) AND IEEE SMART DATA (SMARTDATA), 2016, : 903 - 906