Spam filtering using statistical data compression models

被引:0
|
作者
Department of Intelligent Systems, Jožef Stefan Institute, Jamova 39, Ljubljana, SI-1000, Slovenia [1 ]
不详 [2 ]
不详 [3 ]
机构
来源
J. Mach. Learn. Res. | 2006年 / 2673-2698期
关键词
Adaptive filtering - Classification (of information) - Data compression - Electronic mail - Learning algorithms - Markov processes - Text processing;
D O I
暂无
中图分类号
学科分类号
摘要
Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. Since spam evolves continuously and most practical applications are based on online user feedback, the task calls for fast, incremental and robust learning algorithms. In this paper, we investigate a novel approach to spam filtering based on adaptive statistical data compression models. The nature of these models allows them to be employed as probabilistic text classifiers based on character-level or binary sequences. By modeling messages as sequences, tokenization and other error-prone preprocessing steps are omitted altogether, resulting in a method that is very robust. The models are also fast to construct and incrementally updateable. We evaluate the filtering performance of two different compression algorithms; dynamic Markov compression and prediction by partial matching. The results of our empirical evaluation indicate that compression models outperform currently established spam filters, as well as a number of methods proposed in previous studies.
引用
收藏
相关论文
共 50 条
  • [1] Spam filtering using statistical data compression models
    Bratko, Andrej
    Cormack, Gordon V.
    Filipic, Bogdan
    Lynam, Thomas R.
    Zupan, Blaz
    JOURNAL OF MACHINE LEARNING RESEARCH, 2006, 7 : 2673 - 2698
  • [2] SPAM DETECTION USING DATA COMPRESSION AND SIGNATURES
    Prilepok, Michal
    Berek, Petr
    Platos, Jan
    Snasel, Vaclav
    CYBERNETICS AND SYSTEMS, 2013, 44 (6-7) : 533 - 549
  • [3] Sentiment polarity classification using statistical data compression models
    Ziegelmayer, Dominique
    Schrader, Rainer
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 731 - 738
  • [4] Spam filtering using spam mail communities
    Deepak, P
    Parameswaran, S
    2005 SYMPOSIUM ON APPLICATIONS AND THE INTERNET, PROCEEDINGS, 2005, : 377 - 383
  • [6] Unsupervised Approach for Email Spam Filtering using Data Mining
    Manaa M.E.
    Obaid A.J.
    Dosh M.H.
    EAI Endorsed Transactions on Energy Web, 2021, 8 (36) : 1 - 6
  • [7] SMS spam filtering: Methods and data
    Delany, Sarah Jane
    Buckley, Mark
    Greene, Derek
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (10) : 9899 - 9908
  • [8] Leveraging Readability and Sentiment in Spam Review Filtering Using Transformer Models
    Kanmani S.
    Balasubramanian S.
    Computer Systems Science and Engineering, 2023, 45 (02): : 1439 - 1454
  • [9] Temporal Filtering of InSAR Data Using Statistical Parameters From NWP Models
    Gong, Wenyu
    Meyer, Franz J.
    Liu, Shizhuo
    Hanssen, Ramon F.
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2015, 53 (07): : 4033 - 4044
  • [10] Using Compression Models for Filtering Troll Comments
    de-la-Pena-Sordo, Jorge
    Pastor-Lopez, Iker
    Santos, Igor
    Bringas, Pablo G.
    PROCEEDINGS OF THE 2015 10TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, 2015, : 661 - 666