An evaluation of Naive Bayes variants in content-based learning for spam filtering

被引:23
|
作者
Seewald, Alexander K. [1 ]
机构
[1] Seewald Solut, A-1180 Vienna, Austria
关键词
empirical study; spam filtering; machine learning; Naive Bayes;
D O I
10.3233/IDA-2007-11505
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe an in-depth analysis of spam-filtering performance of a simple Naive Bayes learner and two extended variants. A set of seven mailboxes comprising about 65,000 mails from seven different users, as well as a representative snapshot of 25,000 mails which were received over 18 weeks by a single user, were used for evaluation. Our main motivation was to test whether two extended variants of Naive Bayes learning, SA-Train and CRM114, were superior to simple Naive Bayes learning, represented by SpamBayes. Surprisingly, we found that the performance of these systems was remarkably similar and that the extended systems have significant weaknesses which are not apparent for the simpler Naive Bayes learner. The simpler Naive Bayes learner, SpamBayes, also offers the most stable performance in that it deteriorates least over time. Overall, SpamBayes should be preferred over the more complex variants.
引用
收藏
页码:497 / 524
页数:28
相关论文
共 50 条
  • [31] A comparison of event models for naive Bayes anti-spam e-mail filtering
    Schneider, KM
    [J]. EACL 2003: 10TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, : 307 - 314
  • [32] Transfer Naive Bayes Learning using Augmentation and Stacking for SMS Spam Detection
    Ulus, Cihan
    Wang, Zhiqiang
    Iqbal, Sheikh M. A.
    Khan, K. Md. Salman
    Zhu, Xingquan
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG), 2022, : 275 - 282
  • [33] An approach to spam detection by Naive Bayes ensemble based on decision induction
    Yang, Zhen
    Nie, Xiangfei
    Xu, Weiran
    Guo, Jun
    [J]. ISDA 2006: SIXTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 2, 2006, : 861 - +
  • [34] Research On Spam Filter Based On Improved Naive Bayes and KNN Algorithm
    Ren, Biyi
    Shi, Yuliang
    [J]. PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON MACHINERY, MATERIALS AND COMPUTING TECHNOLOGY, 2016, 60 : 1113 - 1116
  • [35] Lazy associative classification for content-based spam detection
    Veloso, Adriano
    Meira, Wagner, Jr.
    [J]. LA-WEB 06: FOURTH LATIN AMERICAN WEB CONGRESS, PROCEEDINGS, 2006, : 154 - +
  • [36] Content-based analysis to detect Arabic web spam
    Al-Kabi, Mohammed
    Wahsheh, Heider
    Alsmadi, Izzat
    Al-Shawakfa, Emad
    Wahbeh, Abdullah
    Al-Hmoud, Ahmed
    [J]. JOURNAL OF INFORMATION SCIENCE, 2012, 38 (03) : 284 - 296
  • [37] A Collaborative Filtering Approach Based on Naive Bayes Classifier
    Valdiviezo-Diaz, Priscila
    Ortega, Fernando
    Cobos, Eduardo
    Lara-Cabrera, Raul
    [J]. IEEE ACCESS, 2019, 7 : 108581 - 108592
  • [39] Breaking and Fixing Content-Based Filtering
    Dhiman, Mayank
    Jakobsson, Markus
    Yen, Ting-Fang
    [J]. PROCEEDINGS OF THE 2017 APWG SYMPOSIUM ON ELECTRONIC CRIME RESEARCH (ECRIME), 2017, : 52 - 56
  • [40] Content-based image filtering for recommendation
    Jung, Kyung-Yong
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2006, 4203 : 312 - 321