New Spam Filtering Method with Hadoop Tuning-Based MapReduce Na?ve Bayes

被引:2
|
作者
Ji, Keungyeup [1 ]
Kwon, Youngmi [1 ]
机构
[1] Chungnam Natl Univ, Dept Radio & Informat Commun Engn, Daejeon 34134, South Korea
来源
关键词
Hadoop; hadoop distributed file system(HDFS); MapReduce; configuration parameter; malicious email filtering; Naive Bayes;
D O I
10.32604/csse.2023.031270
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As the importance of email increases, the amount of malicious email is also increasing, so the need for malicious email filtering is growing. Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques, we used a Hadoop MapReduce framework and Naive Bayes among machine learning methods for malicious email filtering. Naive Bayes was selected because it is one of the top machine learning methods(Support Vector Machine (SVM), Naive Bayes, K-Nearest Neighbor(KNN), and Decision Tree) in terms of execution time and accuracy. Malicious email was filtered with MapReduce programming using the Naive Bayes technique, which is a supervised machine learning method, in a Hadoop framework with optimized performance and also with the Python program tech-nique with the Naive Bayes technique applied in a bare metal server environment with the Hadoop environment not applied. According to the results of a comparison of the accuracy and predictive error rates of the two methods, the Hadoop MapReduce Naive Bayes method improved the accuracy of spam and ham email identification 1.11 times and the prediction error rate 14.13 times compared to the non-Hadoop Python Naive Bayes method.
引用
收藏
页码:201 / 214
页数:14
相关论文
共 50 条
  • [21] Bayesian spam filtering mechanism based on decision tree of attribute set dependence in the Mapreduce framework
    Sun, Yuqiang, 1600, Bentham Science Publishers B.V., P.O. Box 294, Bussum, 1400 AG, Netherlands (08):
  • [22] DCDroid: An APK Static Identification Method Based on Naïve Bayes Classifier and Dual-Centrality Analysis
    Han, Lansheng
    Chen, Peng
    Liao, Wei
    IET INFORMATION SECURITY, 2024, 2024
  • [23] Efficient implementation of class-based decomposition schemes for Na⟨ve Bayes
    Park, Sang-Hyeun
    Fuernkranz, Johannes
    MACHINE LEARNING, 2014, 96 (03) : 295 - 309
  • [24] Naïve Bayes classifier based on reliability measurement for datasets with noisy labels
    Zhu, Yingqiu
    Wang, Yinzhi
    Qin, Lei
    Zhang, Bo
    Shia, Ben-Chang
    Chen, Mingchih
    ANNALS OF OPERATIONS RESEARCH, 2023,
  • [25] Efficient implementation of class-based decomposition schemes for Naïve Bayes
    Sang-Hyeun Park
    Johannes Fürnkranz
    Machine Learning, 2014, 96 : 295 - 309
  • [26] Social Context Based Naive Bayes Filtering of Spam Messages from Online Social Networks
    Kiliroor, Cinu C.
    Valliyammai, C.
    SOFT COMPUTING IN DATA ANALYTICS, SCDA 2018, 2019, 758 : 699 - 706
  • [27] Determination of near-fault impulsive signals with multivariate naïve Bayes method
    Deniz Ertuncay
    Giovanni Costa
    Natural Hazards, 2021, 108 : 1763 - 1780
  • [28] In silico prediction of drug-induced myelotoxicity by using Na⟨ve Bayes method
    Zhang, Hui
    Yu, Peng
    Zhang, Teng-Guo
    Kang, Yan-Li
    Zhao, Xiao
    Li, Yuan-Yuan
    He, Jia-Hui
    Zhang, Ji
    MOLECULAR DIVERSITY, 2015, 19 (04) : 945 - 953
  • [29] In silico prediction of drug-induced myelotoxicity by using Naïve Bayes method
    Hui Zhang
    Peng Yu
    Teng-Guo Zhang
    Yan-Li Kang
    Xiao Zhao
    Yuan-Yuan Li
    Jia-Hui He
    Ji Zhang
    Molecular Diversity, 2015, 19 : 945 - 953
  • [30] Research in Anti-Spam Method Based on Bayesian Filtering
    Wu, Jiansheng
    Deng, Tao
    PACIIA: 2008 PACIFIC-ASIA WORKSHOP ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION, VOLS 1-3, PROCEEDINGS, 2008, : 1838 - 1842