An evaluation of Naive Bayes variants in content-based learning for spam filtering

被引:23
|
作者
Seewald, Alexander K. [1 ]
机构
[1] Seewald Solut, A-1180 Vienna, Austria
关键词
empirical study; spam filtering; machine learning; Naive Bayes;
D O I
10.3233/IDA-2007-11505
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe an in-depth analysis of spam-filtering performance of a simple Naive Bayes learner and two extended variants. A set of seven mailboxes comprising about 65,000 mails from seven different users, as well as a representative snapshot of 25,000 mails which were received over 18 weeks by a single user, were used for evaluation. Our main motivation was to test whether two extended variants of Naive Bayes learning, SA-Train and CRM114, were superior to simple Naive Bayes learning, represented by SpamBayes. Surprisingly, we found that the performance of these systems was remarkably similar and that the extended systems have significant weaknesses which are not apparent for the simpler Naive Bayes learner. The simpler Naive Bayes learner, SpamBayes, also offers the most stable performance in that it deteriorates least over time. Overall, SpamBayes should be preferred over the more complex variants.
引用
收藏
页码:497 / 524
页数:28
相关论文
共 50 条
  • [1] Content-Based Spam Filtering
    Almeida, Tiago A.
    Yamakami, Akebo
    [J]. 2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [2] Spam Filtering:Online Naive Bayes Based on TONE
    Guanglu Sun
    Hongyue Sun
    Yingcai Ma
    Yuewu Shen
    [J]. ZTE Communications, 2013, 11 (02) : 51 - 54
  • [3] An Overview of Content-Based Spam Filtering Techniques
    Khorsi, Ahmed
    [J]. INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2007, 31 (03): : 269 - 277
  • [4] Understanding of the Naive Bayes Classifier in Spam Filtering
    Wei, Qijia
    [J]. 6TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION (CDMMS 2018), 2018, 1967
  • [5] SDAI: An integral evaluation methodology for content-based spam filtering models
    Perez-Diaz, Noemi
    Ruano-Ordas, David
    Fdez-Riverola, Fiorentino
    Mendez, Jose R.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (16) : 12487 - 12500
  • [6] Content-based Approach for Vietnamese Spam SMS Filtering
    Pham, Thai-Hoang
    Le-Hong, Phuong
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 41 - 44
  • [7] Word Embedding based Multinomial Naive Bayes Algorithm for Spam Filtering
    Kadam, Sumedh
    Gala, Aayush
    Gehlot, Pritesh
    Kurup, Aditya
    Ghag, Kranti
    [J]. 2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
  • [8] A Support Vector Machine based Naive Bayes Algorithm for Spam Filtering
    Feng, Weimiao
    Sun, Jianguo
    Zhang, Liguo
    Cao, Cuiling
    Yang, Qing
    [J]. 2016 IEEE 35TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2016,
  • [9] Content-based concept drift detection for Email spam filtering
    Zi Hayat, Morteza
    Basiri, Javad
    Seyedhossein, Leila
    Shakery, Azadeh
    [J]. 2010 5th International Symposium on Telecommunications, IST 2010, 2010, : 531 - 536
  • [10] Spam Filtering using Association Rules and Naive Bayes Classifier
    Yang, Tianda
    Qian, Kai
    Lo, Dan Chia-Tien
    Al Nasr, Kamal
    Qian, Ying
    [J]. PROCEEDINGS OF 2015 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATCS AND COMPUTING (IEEE PIC), 2015, : 638 - 642