Parallel Implementation of Classification Algorithms Based on MapReduce

被引:0
|
作者
He, Qing [1 ]
Zhuang, Fuzhen [1 ]
Li, Jincheng [1 ]
Shi, Zhongzhi [1 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
来源
关键词
Data Mining; Classification; Parallel Implementation; Large Dataset; MapReduce;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data mining has attracted extensive research for several decades. As an important task of data mining, classification plays an important role in information retrieval, web searching, CRM, etc. Most of the present classification techniques are serial, which become impractical for large dataset. The computing resource is under-utilized and the executing time is not waitable. Provided the program mode of MapReduce, we propose the parallel implementation methods of several classification algorithms, such as k-nearest neighbors, naive bayesian model and decision tree, etc. Preparatory experiments show that the proposed parallel methods can not only process large dataset, but also can be extended to execute on a cluster, which can significantly improve the efficiency.
引用
收藏
页码:655 / 662
页数:8
相关论文
共 50 条
  • [1] Parallel Implementation of Apriori Algorithm Based on MapReduce
    Li, Ning
    Zeng, Li
    He, Qing
    Shi, Zhongzhi
    [J]. INTERNATIONAL JOURNAL OF NETWORKED AND DISTRIBUTED COMPUTING, 2013, 1 (02) : 89 - 96
  • [2] Implementation of Parallel CASINO Algorithm Based on MapReduce
    Zhang, Li
    Shi, Yijie
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND ENGINEERING APPLICATIONS, 2016, 63 : 104 - 109
  • [3] Parallel implementation of Apriori algorithm based on MapReduce
    Li N.
    Zeng L.
    He Q.
    Shi Z.
    [J]. International Journal of Networked and Distributed Computing, 2013, 1 (2) : 89 - 96
  • [4] A Parallel Genetic Algorithms Framework based on Hadoop MapReduce
    Ferrucci, Filomena
    Salza, Pasquale
    Kechadi, M-Tahar
    Sarro, Federica
    [J]. 30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 1664 - 1667
  • [5] MapReduce-based Parallel Algorithms for Multidimensionnal Data Analysis
    Pan, Jie
    Magoules, Frederic
    Le Biannic, Yann
    [J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2012, 6 (02) : 325 - 350
  • [6] MapReduce Solutions Classification by Their Implementation
    Orynbekova, Kamila
    Bogdanchikov, Andrey
    Cankurt, Selcuk
    Adamov, Abzatdin
    Kadyrov, Shirali
    [J]. INTERNATIONAL JOURNAL OF ENGINEERING PEDAGOGY, 2023, 13 (05): : 58 - 71
  • [7] elephant56: Design and Implementation of a Parallel Genetic Algorithms Framework on Hadoop MapReduce
    Salza, Pasquale
    Ferrucci, Filomena
    Sarro, Federica
    [J]. PROCEEDINGS OF THE 2016 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'16 COMPANION), 2016, : 1315 - 1322
  • [8] Parallel Associative Classification Data Mining Frameworks Based MapReduce
    Thabtah, Fadi
    Hammoud, Suhel
    Abdel-Jaber, Hussein
    [J]. PARALLEL PROCESSING LETTERS, 2015, 25 (02)
  • [9] Classification Framework of MapReduce Scheduling Algorithms
    Tiwari, Nidhi
    Sarkar, Santonu
    Bellur, Umesh
    Indrawan, Maria
    [J]. ACM COMPUTING SURVEYS, 2015, 47 (03)
  • [10] Parallel attribute reduction algorithms using MapReduce
    Qian, Jin
    Miao, Duoqian
    Zhang, Zehua
    Yue, Xiaodong
    [J]. INFORMATION SCIENCES, 2014, 279 : 671 - 690