Compression-based spam filter

被引:6
|
作者
Almeida, Tiago A. [1 ]
Yamakami, Akebo [2 ]
机构
[1] Fed Univ Sao Carlos UFSCar, Dept Comp Sci, BR-18052780 Sorocaba, SP, Brazil
[2] Univ Campinas UNICAMP, Sch Elect & Comp Engn, BR-13083970 Campinas, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
compression-based model; spam filter; text categorization; knowledge-based system; machine learning; CLASSIFICATION;
D O I
10.1002/sec.639
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, e-mail spam is not a novelty, but it is still an important problem with a high impact on the economy. Spam filtering poses a special problem in text categorization, in which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. In this paper, we present a novel approach to spam filtering based on a compression-based model. We have conducted an empirical experiment on eight public and real non-encoded datasets. The results indicate that the proposed filter is fast to construct, is incrementally updateable, and clearly outperforms established spam classifiers. Copyright (c) 2012 John Wiley & Sons, Ltd.
引用
下载
收藏
页码:327 / 335
页数:9
相关论文
共 50 条
  • [41] Effective Construction of Compression-based Feature Space
    Koga, Hisashi
    Nakajima, Yuji
    Toda, Takahisa
    PROCEEDINGS OF 2016 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA 2016), 2016, : 116 - 120
  • [42] Compression-based data mining of sequential data
    Keogh, Eamonn
    Lonardi, Stefano
    Ratanamahatana, Chotirat Ann
    Wei, Li
    Lee, Sang-Hee
    Handley, John
    DATA MINING AND KNOWLEDGE DISCOVERY, 2007, 14 (01) : 99 - 129
  • [43] Compression-based averaging of selective naive Bayes classifiers
    Boulle, Marc
    JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 1659 - 1685
  • [44] Level compression-based image representation and its applications
    Chung, KL
    Hong, KB
    PATTERN RECOGNITION, 1998, 31 (03) : 327 - 332
  • [45] Competitive Author Profiling Using Compression-Based Strategies
    Claude, Francisco
    Galaktionov, Daniil
    Konow, Roberto
    Ladra, Susana
    Pedreira, Oscar
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2017, 25 : 5 - 20
  • [46] A Compression-Based Method for Detecting Anomalies in Textual Data
    de la Torre-abaitua, Gonzalo
    Lago-Fernandez, Luis Fernando
    Arroyo, David
    ENTROPY, 2021, 23 (05)
  • [47] Compression-Based Document Length Prior for Language Models
    Parapar, Javier
    Losada, David E.
    Barreiro, Alvaro
    PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 652 - 653
  • [48] Text Classification Using Compression-Based Dissimilarity Measures
    Coutinho, David Pereira
    Figueiredo, Mario A. T.
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (05)
  • [49] Neural Compression-Based Feature Learning for Video Restoration
    Huang, Cong
    Li, Jiahao
    Li, Bin
    Liu, Dong
    Lu, Yan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5862 - 5871
  • [50] Relevance of Contextual Information in Compression-Based Text Clustering
    Granados, Ana
    Martinez, Rafael
    Camacho, David
    de Borja Rodriguez, Francisco
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2010, 2010, 6283 : 259 - 266