Naive Bayes Classifier Based Partitioner for MapReduce

被引：2

作者：

Chen, Lei ^{[1
]}

Lu, Wei ^{[1
]}

Bao, Ergude ^{[1
]}

Wang, Liqiang ^{[2
]}

Xing, Weiwei ^{[1
]}

Cai, Yuanyuan ^{[3
]}

机构：

[1] Beijing Jiaotong Univ, Sch Software Engn, Beijing 100044, Peoples R China

[2] Univ Cent Florida, Dept Comp Sci, Orlando, FL 32816 USA

[3] Beijing Technol & Business Univ, Sch Comp & Informat Engn, Beijing Key Lab Big Data Technol Food Safety, Beijing 100048, Peoples R China

来源：

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES | 2018年 / E101A卷 / 05期

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

MapReduce; hadoop; data locality; data skew; naive Bayes; bandwidth; job type; LOCALITY; SYSTEM;

D O I：

10.1587/transfun.E101.A.778

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

MapReduce is an effective framework for processing large datasets in parallel over a cluster. Data locality and data skew on the reduce side are two essential issues in MapReduce. Improving data locality can decrease network traffic by moving reduce tasks to the nodes where the reducer input data is located. Data skew will lead to load imbalance among reducer nodes. Partitioning is an important feature of MapReduce because it determines the reducer nodes to which map output results will be sent. Therefore, an effective partitioner can improve MapReduce performance by increasing data locality and decreasing data skew on the reduce side. Previous studies considering both essential issues can be divided into two categories: those that preferentially improve data locality, such as LEEN, and those that preferentially improve load balance, such as CLP. However, all these studies ignore the fact that for different types of jobs, the priority of data locality and data skew on the reduce side may produce different effects on the execution time. In this paper, we propose a naive Bayes classifier based partitioner, namely, BAPM, which achieves better performance because it can automatically choose the proper algorithm (LEEN or CLP) by leveraging the naive Bayes classifier, i.e., considering job type and bandwidth as classification attributes. Our experiments are performed in a Hadoop cluster, and the results show that BAPM boosts the computing performance of MapReduce. The selection accuracy reaches 95.15%. Further, compared with other popular algorithms, under specific bandwidths, the improvement BAPM achieved is up to 31.31%.

引用

页码：778 / 786

页数：9

共 50 条

[41] Regularization and averaging of the selective Naive Bayes classifier
Boulle, Marc
[J]. 2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 1680 - 1688
[42] A dynamic trust model based on Naive Bayes classifier for ubiquitous environments
Yuan, Weiwei
Guan, Donghai
Lee, Sungyoung
Lee, Youngkoo
[J]. HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2006, 4208 : 562 - 571
[43] Vulnerability Analysis of IoT Devices to Cyberattacks Based on Naive Bayes Classifier
Mizera-Pietraszko, Jolanta
Tancula, Jolanta
[J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, PT II, 2022, 13758 : 630 - 642
[44] Multiple explanations driven Naive Bayes classifier
Almonayyes, A
[J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2006, 12 (02) : 127 - 139
[45] A sequential naive Bayes classifier for DNA barcodes
Anderson, Michael P.
Dubnicka, Suzanne R.
[J]. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2014, 13 (04) : 423 - 434
[46] A Classifier Learning Method Based on Tree-Augmented Naive Bayes
Chen Xi
Zhang Kun
[J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2019, 41 (08) : 2001 - 2008
[47] Measuring Software Maintainability with Naive Bayes Classifier
Iqbal, Nayyar
Sang, Jun
Chen, Jing
Xia, Xiaofeng
[J]. ENTROPY, 2021, 23 (02) : 1 - 27
[48] Classifying Twitter Data with Naive Bayes Classifier
Tseng, Chris
Patel, Nishant
Paranjape, Hrishikesh
Lin, T. Y.
Teoh, SooTee
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC 2012), 2012, : 294 - 299
[49] Software Defect Prediction with Naive Bayes Classifier
Rahim, Aqsa
Hayat, Zara
Abbas, Muhammad
Rahim, Amna
Rahim, Muhammad Abdul
[J]. PROCEEDINGS OF 2021 INTERNATIONAL BHURBAN CONFERENCE ON APPLIED SCIENCES AND TECHNOLOGIES (IBCAST), 2021, : 293 - 297
[50] Outcome Prediction of DOTA2 Based on Naive Bayes Classifier
Wang, Kaixiang
Shang, Wenqian
[J]. 2017 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS 2017), 2017, : 591 - 593

← 1 2 3 4 5 →