Naive Bayes Classifier Based Partitioner for MapReduce

被引:2
|
作者
Chen, Lei [1 ]
Lu, Wei [1 ]
Bao, Ergude [1 ]
Wang, Liqiang [2 ]
Xing, Weiwei [1 ]
Cai, Yuanyuan [3 ]
机构
[1] Beijing Jiaotong Univ, Sch Software Engn, Beijing 100044, Peoples R China
[2] Univ Cent Florida, Dept Comp Sci, Orlando, FL 32816 USA
[3] Beijing Technol & Business Univ, Sch Comp & Informat Engn, Beijing Key Lab Big Data Technol Food Safety, Beijing 100048, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
MapReduce; hadoop; data locality; data skew; naive Bayes; bandwidth; job type; LOCALITY; SYSTEM;
D O I
10.1587/transfun.E101.A.778
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
MapReduce is an effective framework for processing large datasets in parallel over a cluster. Data locality and data skew on the reduce side are two essential issues in MapReduce. Improving data locality can decrease network traffic by moving reduce tasks to the nodes where the reducer input data is located. Data skew will lead to load imbalance among reducer nodes. Partitioning is an important feature of MapReduce because it determines the reducer nodes to which map output results will be sent. Therefore, an effective partitioner can improve MapReduce performance by increasing data locality and decreasing data skew on the reduce side. Previous studies considering both essential issues can be divided into two categories: those that preferentially improve data locality, such as LEEN, and those that preferentially improve load balance, such as CLP. However, all these studies ignore the fact that for different types of jobs, the priority of data locality and data skew on the reduce side may produce different effects on the execution time. In this paper, we propose a naive Bayes classifier based partitioner, namely, BAPM, which achieves better performance because it can automatically choose the proper algorithm (LEEN or CLP) by leveraging the naive Bayes classifier, i.e., considering job type and bandwidth as classification attributes. Our experiments are performed in a Hadoop cluster, and the results show that BAPM boosts the computing performance of MapReduce. The selection accuracy reaches 95.15%. Further, compared with other popular algorithms, under specific bandwidths, the improvement BAPM achieved is up to 31.31%.
引用
收藏
页码:778 / 786
页数:9
相关论文
共 50 条
  • [1] Optimizing MapReduce Partitioner Using Naive Bayes Classifier
    Chen, Lei
    Lu, Wei
    Wang, Liqiang
    Bao, Ergude
    Xing, Weiwei
    Yang, Yong
    Yuan, Victor
    [J]. 2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI, 2017, : 812 - 819
  • [2] MapReduce Implementation of a Multinomial and Mixed Naive Bayes Classifier
    Bagui, Sikha
    Devulapalli, Keerthi
    John, Sharon
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2020, 16 (02) : 1 - 23
  • [3] Holoentropy based Correlative Naive Bayes classifier and MapReduce model for classifying the big data
    Banchhor, Chitrakant
    Srinivasu, N.
    [J]. EVOLUTIONARY INTELLIGENCE, 2022, 15 (02) : 1037 - 1050
  • [4] Holoentropy based Correlative Naive Bayes classifier and MapReduce model for classifying the big data
    Chitrakant Banchhor
    N. Srinivasu
    [J]. Evolutionary Intelligence, 2022, 15 : 1037 - 1050
  • [5] Migration of Relational Database to MongoDB and Data Analytics using Naive Bayes Classifier based on Mapreduce Approach
    Solanke, Ganesh B.
    Rajeswari, K.
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2017,
  • [6] Threshold-based Naive Bayes classifier
    Romano, Maurizio
    Contu, Giulia
    Mola, Francesco
    Conversano, Claudio
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2024, 18 (02) : 325 - 361
  • [7] A Naive Bayes Classifier Based on Neighborhood Granulation
    Fu, Xingyu
    Chen, Yingyue
    Yao, Zhiyuan
    Chen, Yumin
    Zeng, Nianfeng
    [J]. ROUGH SETS, IJCRS 2022, 2022, 13633 : 132 - 142
  • [8] A Focused Crawler Based on Naive Bayes Classifier
    Wang, Wenxian
    Chen, Xingshu
    Zou, Yongbin
    Wang, Haizhou
    Dai, Zongkun
    [J]. 2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 517 - 521
  • [9] Improving Usual Naive Bayes Classifier Performances with Neural Naive Bayes based Models
    Azeraf, Elie
    Monfrini, Emmanuel
    Pieczynski, Wojciech
    [J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 315 - 322
  • [10] FCNB: Fuzzy Correlative Naive Bayes Classifier with MapReduce Framework for Big Data Classification
    Banchhor, Chitrakant
    Srinivasu, N.
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2020, 29 (01) : 994 - 1006