Textual case-based reasoning for spam filtering: a comparison of feature-based and feature-free approaches

被引:6
|
作者
Delany, Sarah Jane [2 ]
Bridge, Derek [1 ]
机构
[1] Univ Coll Cork, Cork, Ireland
[2] Dublin Inst Technol, Dublin, Ireland
关键词
spam filtering; case-based reasoning; case-base editing; case-based maintenance; feature selection; distance measures; text compression;
D O I
10.1007/s10462-007-9041-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spam filtering is a text classification task to which Case-Based Reasoning (CBR) has been successfully applied. We describe the ECUE system, which classifies emails using a feature-based form of textual CBR. Then, we describe an alternative way to compute the distances between cases in a feature-free fashion, using a distance measure based on text compression. This distance measure has the advantages of having no set-up costs and being resilient to concept drift. We report an empirical comparison, which shows the feature-free approach to be more accurate than the feature-based system. These results are fairly robust over different compression algorithms in that we find that the accuracy when using a Lempel-Ziv compressor (GZip) is approximately the same as when using a statistical compressor (PPM). We note, however, that the feature-free systems take much longer to classify emails than the feature-based system. Improvements in the classification time of both kinds of systems can be obtained by applying case base editing algorithms, which aim to remove noisy and redundant cases from a case base while maintaining, or even improving, generalisation accuracy. We report empirical results using the Competence-Based Editing (CBE) technique. We show that CBE removes more cases when we use the distance measure based on text compression (without significant changes in generalisation accuracy) than it does when we use the feature-based approach.
引用
收藏
页码:75 / 87
页数:13
相关论文
共 50 条
  • [41] A feature-based survey of model view approaches
    Bruneliere, Hugo
    Burger, Erik
    Cabot, Jordi
    Wimmer, Manuel
    SOFTWARE AND SYSTEMS MODELING, 2019, 18 (03): : 1931 - 1952
  • [42] A feature-based survey of model view approaches
    Hugo Bruneliere
    Erik Burger
    Jordi Cabot
    Manuel Wimmer
    Software & Systems Modeling, 2019, 18 : 1931 - 1952
  • [43] Feature Selection for Situation Recognition in Fuzzy SOM-based Case-Based Reasoning
    Sarkheyli, Arezoo
    Soeffker, Dirk
    2016 IEEE INTERNATIONAL MULTI-DISCIPLINARY CONFERENCE ON COGNITIVE METHODS IN SITUATION AWARENESS AND DECISION SUPPORT (COGSIMA), 2016, : 145 - 151
  • [44] GEOMETRIC REASONING IN FEATURE-BASED DESIGN AND PROCESS PLANNING
    ANDERSON, DC
    CHANG, TC
    COMPUTERS & GRAPHICS, 1990, 14 (02) : 225 - 235
  • [45] FEATURE-BASED COMPUTER MODELING AND REASONING ON MECHANICAL FUNCTIONS
    Sen, Chiradeep
    PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2016, VOL 1B, 2016,
  • [46] Spam Filtering Based on Improved CHI Feature Selection Method
    Lu, Zhimao
    Yu, Hongxia
    Fan, Dongmei
    Yuan, Chaoyue
    PROCEEDINGS OF THE 2009 CHINESE CONFERENCE ON PATTERN RECOGNITION AND THE FIRST CJK JOINT WORKSHOP ON PATTERN RECOGNITION, VOLS 1 AND 2, 2009, : 771 - 773
  • [47] Efficient Spectrum Allocation Using Case-Based Reasoning and Collaborative Filtering Approaches
    Reddy, Yenumula B.
    2010 FOURTH INTERNATIONAL CONFERENCE ON SENSOR TECHNOLOGIES AND APPLICATIONS (SENSORCOMM), 2008, : 375 - 380
  • [48] A case-based reasoning approach to collaborative filtering
    Burke, R
    ADVANCES IN CASE-BASED REASONING, PROCEEDINGS, 2001, 1898 : 370 - 379
  • [49] Textual case-based reasoning: State of the art and perspectives
    Lamontagneguy, Luc
    Lapalme, Guy
    Revue d'Intelligence Artificielle, 2002, 16 (03) : 339 - 366
  • [50] Compositional Adaptation of Explanations in Textual Case-Based Reasoning
    Sizov, Gleb
    Ozturk, Pinar
    Marsi, Erwin
    CASE-BASED REASONING RESEARCH AND DEVELOPMENT, ICCBR 2016, 2016, 9969 : 387 - 401