A Support Vector Machine based Naive Bayes Algorithm for Spam Filtering

被引:34
|
作者
Feng, Weimiao [1 ]
Sun, Jianguo [2 ]
Zhang, Liguo [2 ]
Cao, Cuiling [2 ]
Yang, Qing [3 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Harbin Engn Univ, Dept Comp Sci & Technol, Harbin, Heilongjiang, Peoples R China
[3] Montana State Univ, Dept Comp Sci, Bozeman, MT 59717 USA
关键词
Naive Bayes; support vector machine; SVM trimming technique; spam filtering;
D O I
10.1109/PCCC.2016.7820655
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Naive Bayes classifiers are widely used to filter spam emails, however, the strong independence assumptions between features limit their performance in accurately identifying spams. To address this issue, we proposed a support machine vector based naive Bayes - SVM-NB - filtering system. The SVM-NB first constructs an optimal separating hyperplane that divides samples in the training set into two categories. For samples located nearby the hyperplane, if they are in different categories, one of them will be eliminated from the training set. In this way, the dependence between samples is reduced and the entire training sample space is simplified. With the trimmed training set, the naive Bayes algorithm is applied to classify emails in the test set. The SVM-NB system is evaluated with the dataset obtained from DATAMALL. Experiment results demonstrate that SVM-NB can achieve a higher spam-detection accuracy and a faster classification speed.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Word Embedding based Multinomial Naive Bayes Algorithm for Spam Filtering
    Kadam, Sumedh
    Gala, Aayush
    Gehlot, Pritesh
    Kurup, Aditya
    Ghag, Kranti
    [J]. 2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
  • [2] Spam Filtering:Online Naive Bayes Based on TONE
    Guanglu Sun
    Hongyue Sun
    Yingcai Ma
    Yuewu Shen
    [J]. ZTE Communications, 2013, 11 (02) : 51 - 54
  • [3] An innovative spam filtering model based on support vector machine
    Islam, Md. Rafiqul
    Chowdhury, Morshed U.
    Zhou, Wanlei
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION JOINTLY WITH INTERNATIONAL CONFERENCE ON INTELLIGENT AGENTS, WEB TECHNOLOGIES & INTERNET COMMERCE, VOL 2, PROCEEDINGS, 2006, : 348 - +
  • [4] Detecting Spam Emails/SMS Using Naive Bayes, Support Vector Machine and Random Forest
    Goswami, Vasudha
    Malviya, Vijay
    Sharma, Pratyush
    [J]. INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, 2020, 46 : 608 - 615
  • [5] An Approach to Develop a Hybrid Algorithm Based on Support Vector Machine and Naive Bayes for Anomaly Detection
    Shakya, Subarna
    Sigdel, Sandeep
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2017, : 323 - 327
  • [6] Understanding of the Naive Bayes Classifier in Spam Filtering
    Wei, Qijia
    [J]. 6TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION (CDMMS 2018), 2018, 1967
  • [7] Comparison of Performance Support Vector Machine Algorithm and Naive Bayes for Diabetes Diagnosis
    Watomakin, Dominikus Boli
    Emanuel, Andi Wahju Rahardjo
    [J]. 2019 5TH INTERNATIONAL CONFERENCE ON SCIENCE ININFORMATION TECHNOLOGY (ICSITECH): EMBRACING INDUSTRY 4.0 - TOWARDS INNOVATION IN CYBER PHYSICAL SYSTEM, 2019, : 89 - 94
  • [8] Analysis of Naive Bayes Algorithm for Email Spam Filtering across Multiple Datasets
    Rusland, Nurul Fitriah
    Wahid, Norfaradilla
    Kasim, Shahreen
    Hafit, Hanayanti
    [J]. INTERNATIONAL RESEARCH AND INNOVATION SUMMIT (IRIS2017), 2017, 226
  • [9] Research on spam filtering technology using Support Vector Machine
    Mei, Zheng
    Ji, Geng
    Xiao, Li
    Qiao, Liu
    [J]. 2007 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS PROCEEDINGS, VOLS 1 AND 2: VOL 1: COMMUNICATION THEORY AND SYSTEMS; VOL 2: SIGNAL PROCESSING, COMPUTATIONAL INTELLIGENCE, CIRCUITS AND SYSTEMS, 2007, : 492 - +
  • [10] An SMS Spam Filtering System Using Support Vector Machine
    Joe, Inwhee
    Shim, Hyetaek
    [J]. FUTURE GENERATION INFORMATION TECHNOLOGY, 2010, 6485 : 577 - 584