Spam filtering using Kolmogorov complexity analysis

被引:2
|
作者
Richard, G. [2 ]
Doncescu, A. [1 ]
机构
[1] Univ Toulouse, CNRS, LAAS, Toulouse, France
[2] Univ Toulouse, IRIT, Toulouse, France
关键词
spam; Kolmogorov complexity; compression; clustering; k-nearest neighbours;
D O I
10.1504/IJWGS.2008.018500
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the most irrelevant side effects of e-commerce technology is the development of spamming as an e-marketing technique. Spam e-mails (or unsolicited commercial e-mails) induce a burden for everybody having an electronic mailbox: detecting and filtering spam is then a challenging task and a lot of approaches have been developed to identify spam before it is posted in the end user's mailbox. In this paper, we focus on a relatively new approach whose foundations rely on the works of A. Kolmogorov. The main idea is to give a formal meaning to the notion of 'information content' and to provide a measure of this content. Using such a quantitative approach, it becomes possible to define a distance, which is a major tool for classification purposes. To validate our approach, we proceed in two steps: first, we use the classical compression distance over a mix of spam and legitimate e-mails to check out if they can be properly clustered without any Supervision. It has been the case to highlight a kind of underlying structure for spam e-mails. In the second step, we have implemented a k-nearest neighbours algorithm providing 85% as accuracy rate. Coupled with other anti-spam techniques, compression-based methods could bring a great help in the spam filtering challenge.
引用
收藏
页码:136 / 148
页数:13
相关论文
共 50 条
  • [21] Optimal representation in average using Kolmogorov complexity
    Rivals, E
    Delahaye, JP
    THEORETICAL COMPUTER SCIENCE, 1998, 200 (1-2) : 261 - 287
  • [22] Construction of expanders and superconcentrators using Kolmogorov complexity
    Schöning, U
    RANDOM STRUCTURES & ALGORITHMS, 2000, 17 (01) : 64 - 77
  • [23] Spam filtering using statistical data compression models
    Department of Intelligent Systems, Jožef Stefan Institute, Jamova 39, Ljubljana, SI-1000, Slovenia
    不详
    不详
    J. Mach. Learn. Res., 2006, (2673-2698):
  • [24] Adaptive spam mail filtering using genetic algorithm
    Sanpakdee, U
    Walairacht, A
    Walairacht, S
    8th International Conference on Advanced Communication Technology, Vols 1-3: TOWARD THE ERA OF UBIQUITOUS NETWORKS AND SOCIETIES, 2006, : U441 - U445
  • [25] Spam filtering using statistical data compression models
    Bratko, Andrej
    Cormack, Gordon V.
    Filipic, Bogdan
    Lynam, Thomas R.
    Zupan, Blaz
    JOURNAL OF MACHINE LEARNING RESEARCH, 2006, 7 : 2673 - 2698
  • [26] Email Spam Filtering
    Puertas Sanz, Enrique
    Gomez Hidalgo, Jose Maria
    Cortizo Perez, Jose Carlos
    ADVANCES IN COMPUTERS, VOL 74: SOFTWARE DEVELOPMENT, 2008, 74 : 45 - 114
  • [27] Online Spam Filtering Using Support Vector Machines
    Amayri, Ola
    Bouguila, Nizar
    ISCC: 2009 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, VOLS 1 AND 2, 2009, : 337 - 340
  • [28] Adaptive spam filtering using dynamic feature spaces
    Zhou, Yan
    Mulekar, Madhuri S.
    Nerellapalli, Praveen
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2007, 16 (04) : 627 - 646
  • [29] Using LPP and LS-SVM For Spam Filtering
    Sun, Xia
    Zhang, Qingzhou
    Wang, Ziqiang
    2009 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL II, 2009, : 451 - 454
  • [30] Using Live Spam Beater (LiSB) Framework for Spam Filtering during SMTP Transactions
    Gomez-Meire, Silvana
    Gabriel Marquez, Cesar
    Patricia Aray-Cappello, Eliana
    Mendez, Jose R.
    APPLIED SCIENCES-BASEL, 2022, 12 (20):