Spam filtering using statistical data compression models

被引:0
|
作者
Department of Intelligent Systems, Jožef Stefan Institute, Jamova 39, Ljubljana, SI-1000, Slovenia [1 ]
不详 [2 ]
不详 [3 ]
机构
来源
J. Mach. Learn. Res. | 2006年 / 2673-2698期
关键词
Adaptive filtering - Classification (of information) - Data compression - Electronic mail - Learning algorithms - Markov processes - Text processing;
D O I
暂无
中图分类号
学科分类号
摘要
Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. Since spam evolves continuously and most practical applications are based on online user feedback, the task calls for fast, incremental and robust learning algorithms. In this paper, we investigate a novel approach to spam filtering based on adaptive statistical data compression models. The nature of these models allows them to be employed as probabilistic text classifiers based on character-level or binary sequences. By modeling messages as sequences, tokenization and other error-prone preprocessing steps are omitted altogether, resulting in a method that is very robust. The models are also fast to construct and incrementally updateable. We evaluate the filtering performance of two different compression algorithms; dynamic Markov compression and prediction by partial matching. The results of our empirical evaluation indicate that compression models outperform currently established spam filters, as well as a number of methods proposed in previous studies.
引用
收藏
相关论文
共 50 条
  • [41] Email Spam Filtering using BPNN Classification Algorithm
    Tuteja, Simranjit Kaur
    Bogiri, Nagaraju
    2016 INTERNATIONAL CONFERENCE ON AUTOMATIC CONTROL AND DYNAMIC OPTIMIZATION TECHNIQUES (ICACDOT), 2016, : 915 - 919
  • [42] A study of spam filtering using support vector machines
    Ola Amayri
    Nizar Bouguila
    Artificial Intelligence Review, 2010, 34 : 73 - 108
  • [43] Intelligent SMS Spam Filtering Using Topic Model
    Ma, Jialin
    Zhang, Yongjun
    Liu, Jinling
    Yu, Kun
    Wang, XuAn
    2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS (INCOS), 2016, : 380 - 383
  • [44] Adaptive spam filtering using dynamic feature space
    Zhou, Y
    Mulekar, MS
    Nerellapalli, P
    ICTAI 2005: 17TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, : 302 - 309
  • [45] Anti-spam filtering using neural networks
    Elfayoumy, S
    Yang, Y
    Ahuja, S
    IC-AI '04 & MLMTA'04 , VOL 1 AND 2, PROCEEDINGS, 2004, : 984 - 989
  • [46] A study of spam filtering using support vector machines
    Amayri, Ola
    Bouguila, Nizar
    ARTIFICIAL INTELLIGENCE REVIEW, 2010, 34 (01) : 73 - 108
  • [47] Spam mail filtering system using semantic enrichment
    Kim, HJ
    Kim, HN
    Jung, JJ
    Jo, GS
    WEB INFORMATION SYSTEMS - WISE 2004, PROCEEDINGS, 2004, 3306 : 619 - 628
  • [48] Time-efficient spam e-mail filtering using n-gram models
    Ciltik, Ali
    Gungor, Tunga
    PATTERN RECOGNITION LETTERS, 2008, 29 (01) : 19 - 33
  • [49] Parameterless data compression and noise filtering using association rule mining
    Woon, YK
    Li, X
    Ng, WK
    Lu, WF
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2003, 2737 : 278 - 287
  • [50] Compression of volumetric data sets using motion compensated temporal filtering
    Redondo, R
    Barbarien, J
    Munteanu, A
    Cristóbal, G
    Schelkens, P
    WAVELET APPLICATIONS IN INDUSTRIAL PROCESSING II, 2004, 5607 : 118 - 126