Applying lazy learning algorithms to tackle concept drift in spam filtering

被引:67
|
作者
Fdez-Riverola, F.
Iglesias, E. L.
Diaz, F.
Mendez, J. R.
Corchado, J. M.
机构
[1] Univ Vigo, Dept Informat, Escuela Super Ingn Informat, Orense 32004, Spain
[2] Univ Valladolid, Escuela Univ Informat, Dept Informat, Segovia 40005, Spain
[3] Univ Salamanca, Dept Informat & Automat, E-37008 Salamanca, Spain
关键词
IBR system; concept drift; anti-spam filtering; model evaluation;
D O I
10.1016/j.eswa.2006.04.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A great amount of machine learning techniques have been applied to problems where data is collected over an extended period of time. However, the disadvantage with many real-world applications is that the distribution underlying the data is likely to change over time. In these situations, a problem that many global eager learners face is their inability to adapt to local concept drift. Concept drift in spam is particularly difficult as the spammers actively change the nature of their messages to elude spam filters. Algorithms that track concept drift must be able to identify a change in the target concept (spam or legitimate e-mails) without direct knowledge of the underlying shift in distribution. In this paper we show how a previously successful instance-based reasoning e-mail filtering model can be improved in order to better track concept drift in spam domain. Our proposal is based on the definition of two complementary techniques able to select both terms and e-mails representative of the current situation. The enhanced system is evaluated against other well-known successful lazy learning approaches in two scenarios, all within a cost-sensitive framework. The results obtained from the experiments carried out are very promising and back up the idea that instance-based reasoning systems can offer a number of advantages tackling concept drift in dynamic problems, as in the case of the anti-spam filtering domain. (c) 2006 Elsevier Ltd. All rights reserved.
引用
收藏
页码:36 / 48
页数:13
相关论文
共 50 条
  • [31] Deobfuscating Leetspeak With Deep Learning to Improve Spam Filtering
    de Mendizabal, Inaki Velez
    Vidriales, Xabier
    Basto-Fernandes, Vitor
    Ezpeleta, Enaitz
    Mendez, Jose R.
    Zurutuza, Urko
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2023, 8 (04): : 46 - 55
  • [32] Deep learning-based spam image filtering
    Salama, Wessam M.
    Aly, Moustafa H.
    Abouelseoud, Yasmine
    ALEXANDRIA ENGINEERING JOURNAL, 2023, 68 : 461 - 468
  • [33] A Case for Unsupervised-Learning-based Spam Filtering
    Qian, Feng
    Pathak, Abhinav
    Hu, Y. Charlie
    Mao, Z. Morley
    Xie, Yinglian
    SIGMETRICS 2010: PROCEEDINGS OF THE 2010 ACM SIGMETRICS INTERNATIONAL CONFERENCE ON MEASUREMENT AND MODELING OF COMPUTER SYSTEMS, 2010, 38 (01): : 367 - 368
  • [34] The Impact of Deep Learning Techniques on SMS Spam Filtering
    Gomaa, Wael Hassan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (01) : 544 - 549
  • [35] A Machine Learning based Web Spam Filtering Approach
    Kumar, Santosh
    Gao, Xiaoying
    Welch, Ian
    Mansoori, Masood
    IEEE 30TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS IEEE AINA 2016, 2016, : 973 - 980
  • [36] Collaborative spam filtering based on incremental ontology learning
    Xuan Hau Pham
    Nam-Hee Lee
    Jason J. Jung
    Abolghasem Sadeghi-Niaraki
    Telecommunication Systems, 2013, 52 : 693 - 700
  • [37] Applying evolutionary algorithms to the problem of information filtering
    Tjoa, AM
    Hofferer, M
    Ehrentraut, G
    Untersmeyer, P
    EIGHTH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 1997, : 450 - 458
  • [38] Application of Refined LSA and MD5 Algorithms in Spam Filtering
    Sun, Jingtao
    Zhang, Qiuyu
    Yuan, Zhanting
    JOURNAL OF COMPUTERS, 2009, 4 (03) : 245 - 250
  • [39] Machine intelligence-based algorithms for spam filtering on document labeling
    Devottam Gaurav
    Sanju Mishra Tiwari
    Ayush Goyal
    Niketa Gandhi
    Ajith Abraham
    Soft Computing, 2020, 24 : 9625 - 9638
  • [40] Comparison of Decision Tree Algorithms for Spam E-mail Filtering
    Subasi, Abdulhamit
    Alzahrani, Sara
    Aljuhani, Afnan
    Aljedani, Maha
    2018 1ST INTERNATIONAL CONFERENCE ON COMPUTER APPLICATIONS & INFORMATION SECURITY (ICCAIS' 2018), 2018,