Catching the drift: Using feature-free case-based reasoning for spam filtering

被引:0
|
作者
Delany, Sarah Jane [1 ]
Bridge, Derek [2 ]
机构
[1] Dublin Inst Technol, Dublin, Ireland
[2] Univ Coll Cork, Cork, Ireland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we compare case-based spam filters, focusing on their resilience to concept drift. In particular, we evaluate how to track concept drift using a case-based spam filter that uses a feature-free distance measure based on text compression. In our experiments, we compare two ways to normalise such a distance measure, finding that the one proposed in [1] performs better. We show that a policy as simple as retaining misclassified examples has a hugely beneficial effect on handling concept drift in spam but, on its own, it results in the case base growing by over 30%. We then compare two different retention policies and two different forgetting policies (one a form of instance selection, the other a form of instance weighting) and find that they perform roughly as well as each other while keeping the case base size constant. Finally, we compare a feature-based textual case-based spam filter with our feature-free approach. In the face of concept drift, the feature-based approach requires the case base to be rebuilt periodically so that we can select a new feature set that better predicts the target concept. We find feature-free approaches to have lower error rates than their feature-based equivalents.
引用
收藏
页码:314 / +
页数:4
相关论文
共 50 条
  • [1] Textual case-based reasoning for spam filtering: a comparison of feature-based and feature-free approaches
    Sarah Jane Delany
    Derek Bridge
    Artificial Intelligence Review, 2006, 26 : 75 - 87
  • [2] Textual case-based reasoning for spam filtering: a comparison of feature-based and feature-free approaches
    Delany, Sarah Jane
    Bridge, Derek
    ARTIFICIAL INTELLIGENCE REVIEW, 2006, 26 (1-2) : 75 - 87
  • [3] Relaxing feature selection in spam filtering by using case-based reasoning systems
    Mendez, J. R.
    Fdez-Riverola, F.
    Glez-Pena, D.
    Diaz, F.
    Corchado, J. M.
    PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4874 : 53 - +
  • [4] An assessment of case-based reasoning for spam filtering
    Delany, SJ
    Cunningham, P
    Coyle, L
    ARTIFICIAL INTELLIGENCE REVIEW, 2005, 24 (3-4) : 359 - 378
  • [5] An Assessment of Case-Based Reasoning for Spam Filtering
    Sarah Jane Delany
    Pádraig Cunningham
    Lorcan Coyle
    Artificial Intelligence Review, 2005, 24 : 359 - 378
  • [6] A case-based technique for tracking concept drift in spam filtering
    Delany, SJ
    Cunningham, P
    Tsymbal, A
    Coyle, L
    APPLICATIONS AND INNOVATIONS IN INTELLIGENT SYSTEMS XII, PROCEEDINGS, 2005, : 3 - 16
  • [7] A case-based technique for tracking concept drift in spam filtering
    Delany, SJ
    Cunningham, P
    Tsymbal, A
    Coyle, L
    KNOWLEDGE-BASED SYSTEMS, 2005, 18 (4-5) : 187 - 195
  • [8] Case-Based Reasoning with Feature Clustering
    Hong, Tzung-Pei
    Liou, Yan-Liang
    PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS, 2008, : 449 - +
  • [9] Recommending Inferior Results: A General and Feature-Free Model for Spam Detection
    Liu, Yuli
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 955 - 964
  • [10] A case-based reasoning approach to collaborative filtering
    Burke, R
    ADVANCES IN CASE-BASED REASONING, PROCEEDINGS, 2001, 1898 : 370 - 379