Naive Bayes Classifier Based Partitioner for MapReduce

被引:2
|
作者
Chen, Lei [1 ]
Lu, Wei [1 ]
Bao, Ergude [1 ]
Wang, Liqiang [2 ]
Xing, Weiwei [1 ]
Cai, Yuanyuan [3 ]
机构
[1] Beijing Jiaotong Univ, Sch Software Engn, Beijing 100044, Peoples R China
[2] Univ Cent Florida, Dept Comp Sci, Orlando, FL 32816 USA
[3] Beijing Technol & Business Univ, Sch Comp & Informat Engn, Beijing Key Lab Big Data Technol Food Safety, Beijing 100048, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
MapReduce; hadoop; data locality; data skew; naive Bayes; bandwidth; job type; LOCALITY; SYSTEM;
D O I
10.1587/transfun.E101.A.778
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
MapReduce is an effective framework for processing large datasets in parallel over a cluster. Data locality and data skew on the reduce side are two essential issues in MapReduce. Improving data locality can decrease network traffic by moving reduce tasks to the nodes where the reducer input data is located. Data skew will lead to load imbalance among reducer nodes. Partitioning is an important feature of MapReduce because it determines the reducer nodes to which map output results will be sent. Therefore, an effective partitioner can improve MapReduce performance by increasing data locality and decreasing data skew on the reduce side. Previous studies considering both essential issues can be divided into two categories: those that preferentially improve data locality, such as LEEN, and those that preferentially improve load balance, such as CLP. However, all these studies ignore the fact that for different types of jobs, the priority of data locality and data skew on the reduce side may produce different effects on the execution time. In this paper, we propose a naive Bayes classifier based partitioner, namely, BAPM, which achieves better performance because it can automatically choose the proper algorithm (LEEN or CLP) by leveraging the naive Bayes classifier, i.e., considering job type and bandwidth as classification attributes. Our experiments are performed in a Hadoop cluster, and the results show that BAPM boosts the computing performance of MapReduce. The selection accuracy reaches 95.15%. Further, compared with other popular algorithms, under specific bandwidths, the improvement BAPM achieved is up to 31.31%.
引用
收藏
页码:778 / 786
页数:9
相关论文
共 50 条
  • [41] Regularization and averaging of the selective Naive Bayes classifier
    Boulle, Marc
    [J]. 2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 1680 - 1688
  • [42] A dynamic trust model based on Naive Bayes classifier for ubiquitous environments
    Yuan, Weiwei
    Guan, Donghai
    Lee, Sungyoung
    Lee, Youngkoo
    [J]. HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2006, 4208 : 562 - 571
  • [43] Vulnerability Analysis of IoT Devices to Cyberattacks Based on Naive Bayes Classifier
    Mizera-Pietraszko, Jolanta
    Tancula, Jolanta
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, PT II, 2022, 13758 : 630 - 642
  • [44] Multiple explanations driven Naive Bayes classifier
    Almonayyes, A
    [J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2006, 12 (02) : 127 - 139
  • [45] A sequential naive Bayes classifier for DNA barcodes
    Anderson, Michael P.
    Dubnicka, Suzanne R.
    [J]. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2014, 13 (04) : 423 - 434
  • [46] A Classifier Learning Method Based on Tree-Augmented Naive Bayes
    Chen Xi
    Zhang Kun
    [J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2019, 41 (08) : 2001 - 2008
  • [47] Measuring Software Maintainability with Naive Bayes Classifier
    Iqbal, Nayyar
    Sang, Jun
    Chen, Jing
    Xia, Xiaofeng
    [J]. ENTROPY, 2021, 23 (02) : 1 - 27
  • [48] Classifying Twitter Data with Naive Bayes Classifier
    Tseng, Chris
    Patel, Nishant
    Paranjape, Hrishikesh
    Lin, T. Y.
    Teoh, SooTee
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC 2012), 2012, : 294 - 299
  • [49] Software Defect Prediction with Naive Bayes Classifier
    Rahim, Aqsa
    Hayat, Zara
    Abbas, Muhammad
    Rahim, Amna
    Rahim, Muhammad Abdul
    [J]. PROCEEDINGS OF 2021 INTERNATIONAL BHURBAN CONFERENCE ON APPLIED SCIENCES AND TECHNOLOGIES (IBCAST), 2021, : 293 - 297
  • [50] Outcome Prediction of DOTA2 Based on Naive Bayes Classifier
    Wang, Kaixiang
    Shang, Wenqian
    [J]. 2017 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS 2017), 2017, : 591 - 593