New Spam Filtering Method with Hadoop Tuning-Based MapReduce Na?ve Bayes

被引:2
|
作者
Ji, Keungyeup [1 ]
Kwon, Youngmi [1 ]
机构
[1] Chungnam Natl Univ, Dept Radio & Informat Commun Engn, Daejeon 34134, South Korea
来源
关键词
Hadoop; hadoop distributed file system(HDFS); MapReduce; configuration parameter; malicious email filtering; Naive Bayes;
D O I
10.32604/csse.2023.031270
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As the importance of email increases, the amount of malicious email is also increasing, so the need for malicious email filtering is growing. Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques, we used a Hadoop MapReduce framework and Naive Bayes among machine learning methods for malicious email filtering. Naive Bayes was selected because it is one of the top machine learning methods(Support Vector Machine (SVM), Naive Bayes, K-Nearest Neighbor(KNN), and Decision Tree) in terms of execution time and accuracy. Malicious email was filtered with MapReduce programming using the Naive Bayes technique, which is a supervised machine learning method, in a Hadoop framework with optimized performance and also with the Python program tech-nique with the Naive Bayes technique applied in a bare metal server environment with the Hadoop environment not applied. According to the results of a comparison of the accuracy and predictive error rates of the two methods, the Hadoop MapReduce Naive Bayes method improved the accuracy of spam and ham email identification 1.11 times and the prediction error rate 14.13 times compared to the non-Hadoop Python Naive Bayes method.
引用
收藏
页码:201 / 214
页数:14
相关论文
共 50 条
  • [1] Spam message classification based on the naïve Bayes classification algorithm
    Ning, Bin
    Junwei, Wu
    Feng, Hu
    IAENG International Journal of Computer Science, 2019, 46 (01)
  • [2] Optimizing naïve bayes algorithm for SMS spam filtering on mobile phone to reduce the consumption of resources
    Bao L.-Q.
    Lv L.-X.
    Li J.-L.
    Bao, Li-Qun (baoliqun1983@163.com), 1600, Computer Society of the Republic of China (28): : 174 - 183
  • [3] Naïve Bayes Classifier Model for Detecting Spam Mails
    Kumar S.
    Gupta K.
    Gupta M.
    Annals of Data Science, 2024, 11 (06) : 1887 - 1897
  • [4] A Novel Configuration Tuning Method Based on Feature Selection for Hadoop MapReduce
    Liu, Jun
    Tang, Sule
    Xu, Guangxia
    Ma, Chuang
    Lin, Mingwei
    IEEE ACCESS, 2020, 8 : 63862 - 63871
  • [5] Parallel naïve Bayes regression model-based collaborative filtering recommendation algorithm and its realisation on Hadoop for big data
    Wen S.
    Wang C.
    Li H.
    Zheng G.
    International Journal of Information Technology and Management, 2019, 18 (2-3) : 129 - 142
  • [6] Spam Filtering:Online Naive Bayes Based on TONE
    Guanglu Sun
    Hongyue Sun
    Yingcai Ma
    Yuewu Shen
    ZTECommunications, 2013, 11 (02) : 51 - 54
  • [7] Spam filtering algorithm based on AIS and Bayes network
    Ye, Jixiang
    Tan, Guanzheng
    Jisuanji Gongcheng/Computer Engineering, 2006, 32 (11): : 26 - 28
  • [8] Extended naïve bayes for group based classification
    Samsudin, Noor Azah
    Bradley, Andrew P.
    Advances in Intelligent Systems and Computing, 2014, 287 : 497 - 506
  • [9] Word Embedding based Multinomial Naive Bayes Algorithm for Spam Filtering
    Kadam, Sumedh
    Gala, Aayush
    Gehlot, Pritesh
    Kurup, Aditya
    Ghag, Kranti
    2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
  • [10] A Support Vector Machine based Naive Bayes Algorithm for Spam Filtering
    Feng, Weimiao
    Sun, Jianguo
    Zhang, Liguo
    Cao, Cuiling
    Yang, Qing
    2016 IEEE 35TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2016,