Spam filtering using Kolmogorov complexity analysis

被引:2
|
作者
Richard, G. [2 ]
Doncescu, A. [1 ]
机构
[1] Univ Toulouse, CNRS, LAAS, Toulouse, France
[2] Univ Toulouse, IRIT, Toulouse, France
关键词
spam; Kolmogorov complexity; compression; clustering; k-nearest neighbours;
D O I
10.1504/IJWGS.2008.018500
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the most irrelevant side effects of e-commerce technology is the development of spamming as an e-marketing technique. Spam e-mails (or unsolicited commercial e-mails) induce a burden for everybody having an electronic mailbox: detecting and filtering spam is then a challenging task and a lot of approaches have been developed to identify spam before it is posted in the end user's mailbox. In this paper, we focus on a relatively new approach whose foundations rely on the works of A. Kolmogorov. The main idea is to give a formal meaning to the notion of 'information content' and to provide a measure of this content. Using such a quantitative approach, it becomes possible to define a distance, which is a major tool for classification purposes. To validate our approach, we proceed in two steps: first, we use the classical compression distance over a mix of spam and legitimate e-mails to check out if they can be properly clustered without any Supervision. It has been the case to highlight a kind of underlying structure for spam e-mails. In the second step, we have implemented a k-nearest neighbours algorithm providing 85% as accuracy rate. Coupled with other anti-spam techniques, compression-based methods could bring a great help in the spam filtering challenge.
引用
收藏
页码:136 / 148
页数:13
相关论文
共 50 条
  • [31] Image spam filtering using convolutional neural networks
    Fan Aiwan
    Yang Zhaofeng
    PERSONAL AND UBIQUITOUS COMPUTING, 2018, 22 (5-6) : 1029 - 1037
  • [32] Using visual features for anti-SPAM filtering
    Wu, CT
    Cheng, KT
    Zhu, Q
    Wu, KL
    2005 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), VOLS 1-5, 2005, : 2925 - 2928
  • [33] Image spam filtering using convolutional neural networks
    Fan Aiwan
    Yang Zhaofeng
    Personal and Ubiquitous Computing, 2018, 22 : 1029 - 1037
  • [34] Efficient spam email filtering using adaptive ontology
    Youn, Seongwook
    McLeod, Dennis
    INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, 2007, : 249 - +
  • [35] Email Spam Filtering using BPNN Classification Algorithm
    Tuteja, Simranjit Kaur
    Bogiri, Nagaraju
    2016 INTERNATIONAL CONFERENCE ON AUTOMATIC CONTROL AND DYNAMIC OPTIMIZATION TECHNIQUES (ICACDOT), 2016, : 915 - 919
  • [36] A study of spam filtering using support vector machines
    Ola Amayri
    Nizar Bouguila
    Artificial Intelligence Review, 2010, 34 : 73 - 108
  • [37] Intelligent SMS Spam Filtering Using Topic Model
    Ma, Jialin
    Zhang, Yongjun
    Liu, Jinling
    Yu, Kun
    Wang, XuAn
    2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS (INCOS), 2016, : 380 - 383
  • [38] Adaptive spam filtering using dynamic feature space
    Zhou, Y
    Mulekar, MS
    Nerellapalli, P
    ICTAI 2005: 17TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, : 302 - 309
  • [39] Anti-spam filtering using neural networks
    Elfayoumy, S
    Yang, Y
    Ahuja, S
    IC-AI '04 & MLMTA'04 , VOL 1 AND 2, PROCEEDINGS, 2004, : 984 - 989
  • [40] Does Sentiment Analysis Help in Bayesian Spam Filtering?
    Ezpeleta, Enaitz
    Zurutuza, Urko
    Gomez Hidalgo, Jose Maria
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, 2016, 9648 : 79 - 90