New Spam Filtering Method with Hadoop Tuning-Based MapReduce Na?ve Bayes

被引：2

作者：

Ji, Keungyeup ^{[1
]}

Kwon, Youngmi ^{[1
]}

机构：

[1] Chungnam Natl Univ, Dept Radio & Informat Commun Engn, Daejeon 34134, South Korea

来源：

COMPUTER SYSTEMS SCIENCE AND ENGINEERING | 2023年 / 45卷 / 01期

关键词：

Hadoop; hadoop distributed file system(HDFS); MapReduce; configuration parameter; malicious email filtering; Naive Bayes;

D O I：

10.32604/csse.2023.031270

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

As the importance of email increases, the amount of malicious email is also increasing, so the need for malicious email filtering is growing. Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques, we used a Hadoop MapReduce framework and Naive Bayes among machine learning methods for malicious email filtering. Naive Bayes was selected because it is one of the top machine learning methods(Support Vector Machine (SVM), Naive Bayes, K-Nearest Neighbor(KNN), and Decision Tree) in terms of execution time and accuracy. Malicious email was filtered with MapReduce programming using the Naive Bayes technique, which is a supervised machine learning method, in a Hadoop framework with optimized performance and also with the Python program tech-nique with the Naive Bayes technique applied in a bare metal server environment with the Hadoop environment not applied. According to the results of a comparison of the accuracy and predictive error rates of the two methods, the Hadoop MapReduce Naive Bayes method improved the accuracy of spam and ham email identification 1.11 times and the prediction error rate 14.13 times compared to the non-Hadoop Python Naive Bayes method.

引用

页码：201 / 214

页数：14

共 50 条

[21] Bayesian spam filtering mechanism based on decision tree of attribute set dependence in the Mapreduce framework
Sun, Yuqiang, 1600, Bentham Science Publishers B.V., P.O. Box 294, Bussum, 1400 AG, Netherlands (08):
[22] DCDroid: An APK Static Identification Method Based on Naïve Bayes Classifier and Dual-Centrality Analysis
Han, Lansheng
Chen, Peng
Liao, Wei
IET INFORMATION SECURITY, 2024, 2024
[23] Efficient implementation of class-based decomposition schemes for Na⟨ve Bayes
Park, Sang-Hyeun
Fuernkranz, Johannes
MACHINE LEARNING, 2014, 96 (03) : 295 - 309
[24] Naïve Bayes classifier based on reliability measurement for datasets with noisy labels
Zhu, Yingqiu
Wang, Yinzhi
Qin, Lei
Zhang, Bo
Shia, Ben-Chang
Chen, Mingchih
ANNALS OF OPERATIONS RESEARCH, 2023,
[25] Efficient implementation of class-based decomposition schemes for Naïve Bayes
Sang-Hyeun Park
Johannes Fürnkranz
Machine Learning, 2014, 96 : 295 - 309
[26] Social Context Based Naive Bayes Filtering of Spam Messages from Online Social Networks
Kiliroor, Cinu C.
Valliyammai, C.
SOFT COMPUTING IN DATA ANALYTICS, SCDA 2018, 2019, 758 : 699 - 706
[27] Determination of near-fault impulsive signals with multivariate naïve Bayes method
Deniz Ertuncay
Giovanni Costa
Natural Hazards, 2021, 108 : 1763 - 1780
[28] In silico prediction of drug-induced myelotoxicity by using Na⟨ve Bayes method
Zhang, Hui
Yu, Peng
Zhang, Teng-Guo
Kang, Yan-Li
Zhao, Xiao
Li, Yuan-Yuan
He, Jia-Hui
Zhang, Ji
MOLECULAR DIVERSITY, 2015, 19 (04) : 945 - 953
[29] In silico prediction of drug-induced myelotoxicity by using Naïve Bayes method
Hui Zhang
Peng Yu
Teng-Guo Zhang
Yan-Li Kang
Xiao Zhao
Yuan-Yuan Li
Jia-Hui He
Ji Zhang
Molecular Diversity, 2015, 19 : 945 - 953
[30] Research in Anti-Spam Method Based on Bayesian Filtering
Wu, Jiansheng
Deng, Tao
PACIIA: 2008 PACIFIC-ASIA WORKSHOP ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION, VOLS 1-3, PROCEEDINGS, 2008, : 1838 - 1842

← 1 2 3 4 5 →