New Spam Filtering Method with Hadoop Tuning-Based MapReduce Na?ve Bayes

被引:2
|
作者
Ji, Keungyeup [1 ]
Kwon, Youngmi [1 ]
机构
[1] Chungnam Natl Univ, Dept Radio & Informat Commun Engn, Daejeon 34134, South Korea
来源
关键词
Hadoop; hadoop distributed file system(HDFS); MapReduce; configuration parameter; malicious email filtering; Naive Bayes;
D O I
10.32604/csse.2023.031270
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As the importance of email increases, the amount of malicious email is also increasing, so the need for malicious email filtering is growing. Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques, we used a Hadoop MapReduce framework and Naive Bayes among machine learning methods for malicious email filtering. Naive Bayes was selected because it is one of the top machine learning methods(Support Vector Machine (SVM), Naive Bayes, K-Nearest Neighbor(KNN), and Decision Tree) in terms of execution time and accuracy. Malicious email was filtered with MapReduce programming using the Naive Bayes technique, which is a supervised machine learning method, in a Hadoop framework with optimized performance and also with the Python program tech-nique with the Naive Bayes technique applied in a bare metal server environment with the Hadoop environment not applied. According to the results of a comparison of the accuracy and predictive error rates of the two methods, the Hadoop MapReduce Naive Bayes method improved the accuracy of spam and ham email identification 1.11 times and the prediction error rate 14.13 times compared to the non-Hadoop Python Naive Bayes method.
引用
收藏
页码:201 / 214
页数:14
相关论文
共 50 条
  • [41] Engine gearbox fault diagnosis using empirical mode decomposition method and Naïve Bayes algorithm
    Kiran Vernekar
    Hemantha Kumar
    K V Gangadharan
    Sādhanā, 2017, 42 : 1143 - 1153
  • [43] Weighted naïve Bayes text classification algorithm based on improved distance correlation coefficient
    Shufen Ruan
    Baozhou Chen
    Kunfang Song
    Hongwei Li
    Neural Computing and Applications, 2022, 34 : 2729 - 2738
  • [44] Design of agricultural ontology based on levy flight distributed optimization and Naïve Bayes classifier
    Deepa Rajendran
    S Vigneshwari
    Sādhanā, 2021, 46
  • [45] Accuracy Evaluation of C4.5 and Naïve Bayes Classifiers Using Attribute Ranking Method
    S. Sivakumari
    R. Praveena Priyadarsini
    P. Amudha
    International Journal of Computational Intelligence Systems, 2009, 2 (1) : 60 - 68
  • [46] An efficient method for filtering image-based spam e-mail
    Nhung, Ngo Phuong
    Phuong, Tu Minh
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PROCEEDINGS, 2007, 4673 : 945 - 953
  • [47] Novel spam filtering method based on fuzzy adaptive particle swarm optimization
    Wang, Gang
    Liu, Yuan-Ning
    Zhang, Xiao-Xu
    Zhao, Zheng-Dong
    Zhu, Xiao-Dong
    Liu, Zhen
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2011, 41 (03): : 716 - 720
  • [48] Design and evaluation of a Bayesian-filter-based image spam filtering method
    Uemura, Masahiro
    Tabata, Toshihiro
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON INFORMATION SECURITY AND ASSURANCE, 2008, : 46 - 51
  • [49] Deviation-based spam-filtering method via stochastic approach
    Lee, Daekyung
    Lee, Mi Jin
    Kim, Beom Jun
    EPL, 2018, 121 (06)
  • [50] Two-step based hybrid feature selection method for spam filtering
    Wang, Youwei
    Liu, Yuanning
    Zhu, Xiaodong
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 27 (06) : 2785 - 2796