Naive Bayes Classifier Based Partitioner for MapReduce

被引:2
|
作者
Chen, Lei [1 ]
Lu, Wei [1 ]
Bao, Ergude [1 ]
Wang, Liqiang [2 ]
Xing, Weiwei [1 ]
Cai, Yuanyuan [3 ]
机构
[1] Beijing Jiaotong Univ, Sch Software Engn, Beijing 100044, Peoples R China
[2] Univ Cent Florida, Dept Comp Sci, Orlando, FL 32816 USA
[3] Beijing Technol & Business Univ, Sch Comp & Informat Engn, Beijing Key Lab Big Data Technol Food Safety, Beijing 100048, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
MapReduce; hadoop; data locality; data skew; naive Bayes; bandwidth; job type; LOCALITY; SYSTEM;
D O I
10.1587/transfun.E101.A.778
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
MapReduce is an effective framework for processing large datasets in parallel over a cluster. Data locality and data skew on the reduce side are two essential issues in MapReduce. Improving data locality can decrease network traffic by moving reduce tasks to the nodes where the reducer input data is located. Data skew will lead to load imbalance among reducer nodes. Partitioning is an important feature of MapReduce because it determines the reducer nodes to which map output results will be sent. Therefore, an effective partitioner can improve MapReduce performance by increasing data locality and decreasing data skew on the reduce side. Previous studies considering both essential issues can be divided into two categories: those that preferentially improve data locality, such as LEEN, and those that preferentially improve load balance, such as CLP. However, all these studies ignore the fact that for different types of jobs, the priority of data locality and data skew on the reduce side may produce different effects on the execution time. In this paper, we propose a naive Bayes classifier based partitioner, namely, BAPM, which achieves better performance because it can automatically choose the proper algorithm (LEEN or CLP) by leveraging the naive Bayes classifier, i.e., considering job type and bandwidth as classification attributes. Our experiments are performed in a Hadoop cluster, and the results show that BAPM boosts the computing performance of MapReduce. The selection accuracy reaches 95.15%. Further, compared with other popular algorithms, under specific bandwidths, the improvement BAPM achieved is up to 31.31%.
引用
收藏
页码:778 / 786
页数:9
相关论文
共 50 条
  • [21] The naive Bayes classifier for functional data
    Zhang, Yi-Chen
    Sakhanenko, Lyudmila
    [J]. STATISTICS & PROBABILITY LETTERS, 2019, 152 : 137 - 146
  • [22] Learning an optimal naive Bayes classifier
    Martinez-Arroyo, Miriam
    Sucar, L. Enrique
    [J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS, 2006, : 1236 - +
  • [23] Attribute Weighted Naive Bayes Classifier
    Foo, Lee-Kien
    Chua, Sook-Ling
    Ibrahim, Neveen
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (01): : 1945 - 1957
  • [24] A Smoothed Naive Bayes-Based Classifier for Activity Recognition
    Sarkar, A. M. Jehad
    Lee, Young-Koo
    Lee, Sungyoung
    [J]. IETE TECHNICAL REVIEW, 2010, 27 (02) : 107 - 119
  • [25] RBNBC: Repeat Based Naive Bayes Classifier for Biological Sequences
    Rani, Pratibha
    Pudi, Vikrarn
    [J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 989 - 994
  • [26] Naive Bayes Classifier based watermark detection in wavelet transform
    Elbasi, Ersin
    Eskicioglu, Ahmet M.
    [J]. MULTIMEDIA CONTENT REPRESENTATION, CLASSIFICATION AND SECURITY, 2006, 4105 : 232 - 240
  • [27] Naive Bayes Based Classifier for Credit Card Fraud Discovery
    Ogundokun, Roseline Oluwaseun
    Misra, Sanjay
    Fatigun, Olufunmilayo Joyce
    Adeniyi, Jide Kehinde
    [J]. INFORMATION SYSTEMS (EMCIS 2021), 2022, 437 : 515 - 526
  • [28] A Hybrid Distance-Based and Naive Bayes Online Classifier
    Jedrzejowicz, Joanna
    Jedrzejowicz, Piotr
    [J]. COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2015), PT II, 2015, 9330 : 213 - 222
  • [29] A Distributed Chinese Naive Bayes Classifier Based on Word Embedding
    Feng, Mengke
    Wu, Guoshi
    [J]. PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON MACHINERY, MATERIALS AND COMPUTING TECHNOLOGY, 2016, 60 : 1121 - 1127
  • [30] Opinion Based Book Recommendation Using Naive Bayes Classifier
    Tewari, Anand Shanker
    Ansari, Tasif Sultan
    Barman, Asim Gopal
    [J]. 2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 139 - 144