Applying lazy learning algorithms to tackle concept drift in spam filtering

被引：67

作者：

Fdez-Riverola, F.

Iglesias, E. L.

Diaz, F.

Mendez, J. R.

Corchado, J. M.

机构：

[1] Univ Vigo, Dept Informat, Escuela Super Ingn Informat, Orense 32004, Spain

[2] Univ Valladolid, Escuela Univ Informat, Dept Informat, Segovia 40005, Spain

[3] Univ Salamanca, Dept Informat & Automat, E-37008 Salamanca, Spain

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2007年 / 33卷 / 01期

关键词：

IBR system; concept drift; anti-spam filtering; model evaluation;

D O I：

10.1016/j.eswa.2006.04.011

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A great amount of machine learning techniques have been applied to problems where data is collected over an extended period of time. However, the disadvantage with many real-world applications is that the distribution underlying the data is likely to change over time. In these situations, a problem that many global eager learners face is their inability to adapt to local concept drift. Concept drift in spam is particularly difficult as the spammers actively change the nature of their messages to elude spam filters. Algorithms that track concept drift must be able to identify a change in the target concept (spam or legitimate e-mails) without direct knowledge of the underlying shift in distribution. In this paper we show how a previously successful instance-based reasoning e-mail filtering model can be improved in order to better track concept drift in spam domain. Our proposal is based on the definition of two complementary techniques able to select both terms and e-mails representative of the current situation. The enhanced system is evaluated against other well-known successful lazy learning approaches in two scenarios, all within a cost-sensitive framework. The results obtained from the experiments carried out are very promising and back up the idea that instance-based reasoning systems can offer a number of advantages tackling concept drift in dynamic problems, as in the case of the anti-spam filtering domain. (c) 2006 Elsevier Ltd. All rights reserved.

引用

页码：36 / 48

页数：13

共 50 条

[1] An efficient incremental learning mechanism for tracking concept drift in spam filtering
Sheu, Jyh-Jian
Chu, Ko-Tsung
Li, Nien-Feng
Lee, Cheng-Chi
PLOS ONE, 2017, 12 (02):
[2] Content-based concept drift detection for Email spam filtering
Zi Hayat M.
Basiri J.
Seyedhossein L.
Shakery A.
2010 5th International Symposium on Telecommunications, IST 2010, 2010, : 531 - 536
[3] A case-based technique for tracking concept drift in spam filtering
Delany, SJ
Cunningham, P
Tsymbal, A
Coyle, L
APPLICATIONS AND INNOVATIONS IN INTELLIGENT SYSTEMS XII, PROCEEDINGS, 2005, : 3 - 16
[4] A case-based technique for tracking concept drift in spam filtering
Delany, SJ
Cunningham, P
Tsymbal, A
Coyle, L
KNOWLEDGE-BASED SYSTEMS, 2005, 18 (4-5) : 187 - 195
[5] Architecture of adaptive spam filtering based on machine learning algorithms
Islam, Md Rafiqul
Zhou, Wanlei
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PROCEEDINGS, 2007, 4494 : 458 - +
[6] SMS Spam Filtering using Supervised Machine Learning Algorithms
Navaney, Pavas
Dubey, Gaurav
Rana, Ajay
PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE CONFLUENCE 2018 ON CLOUD COMPUTING, DATA SCIENCE AND ENGINEERING, 2018, : 43 - 48
[7] ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift
Delany, Sarah Jane
Cunningham, Padraig
Smyth, Barry
ECAI 2006, PROCEEDINGS, 2006, 141 : 627 - +
[8] Recurring Concept Detection for Spam Filtering
Angel Abad, Miguel
Bartolo Gomes, Joao
Menasalvas, Ernestina
2014 17TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2014,
[9] Discretisation in Lazy Learning Algorithms
Kai Ming Ting
Artificial Intelligence Review, 1997, 11 : 157 - 174
[10] Discretisation in lazy learning algorithms
Ting, KM
ARTIFICIAL INTELLIGENCE REVIEW, 1997, 11 (1-5) : 157 - 174

← 1 2 3 4 5 →