A Chinese anti-spam filter approach based on Support Vector Machine

被引:0
|
作者
Pang Xiu-li [1 ]
Feng Yu-qiang [1 ]
Jiang Wei [1 ]
机构
[1] Harbin Inst Technol, Sch Management, Harbin 150001, Heilongjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
anti-spam filter; maximum entropy; Naive Bayes; support vector machine;
D O I
暂无
中图分类号
F [经济];
学科分类号
02 ;
摘要
This paper presents an anti-spam filter approach based on Support Vector Machine (SVM). Firstly, we adopt the tri-gram language model to perform word segmentation in the Chinese Email. In order to overcome the sparse data problem, the Absolute Discount Smoothing algorithm is applied. Secondly, the different factoid words are identified by the Automaton Machine, so as to acquire the approximate syntactic and semantic usage of factoid words in the anti-spam filter task. Thirdly, we apply Support Vector Machine to filter the spam, where the Emails are permitted to be written by the cross language, including Chinese and English. The experiments in the large-scale corpora with the cross language show that the SVM can improve the generalization than the Naive Bayes (Smoothed by Lidstone algorithm) by 4.09% precision, and 8.18% higher precision than the Maximum Entropy Model.
引用
收藏
页码:97 / 102
页数:6
相关论文
共 50 条
  • [1] Anti-spam Filters Based on Support Vector Machines
    Xie, Chengwang
    Ding, Lixin
    Du, Xin
    [J]. ADVANCES IN COMPUTATION AND INTELLIGENCE, PROCEEDINGS, 2009, 5821 : 349 - 357
  • [2] An alliance-based anti-spam approach
    Chiu, Yu-Fen
    Chen, Chia-Mei
    Jeng, Bingchiang
    Lin, Hsiao-Chung
    [J]. ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 4, PROCEEDINGS, 2007, : 203 - +
  • [3] A novel anti-spam email approach based on LVQ
    Chuan, Z
    Lu, XL
    Qian, X
    [J]. PARALLEL AND DISTRIBUTED COMPUTING: APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2004, 3320 : 180 - 183
  • [4] Anti-spam Filter Based on Data Mining and Statistical Test
    Lai, Gu-Hsin
    Chou, Chao-Wei
    Chen, Chia-Mei
    Ouv, Ya-Hua
    [J]. COMPUTER AND INFORMATION SCIENCE 2009, 2009, 208 : 179 - 192
  • [5] Anti-spam filtering: A centroid-based classification approach
    Soonthornphisaj, N
    Chaikulseriwat, K
    Tang-On, P
    [J]. 2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 1096 - 1099
  • [6] A simple, configurable SMTP anti-spam filter:: Greylists
    Gonzalez-Talavan, Guillermo
    [J]. COMPUTERS & SECURITY, 2006, 25 (03) : 229 - 236
  • [7] Flow-based anti-spam
    Qiu, XF
    Hao, JH
    Chen, M
    [J]. 2004 IEEE Workshop on IP Operations and Management Proceedings (IPOM 2004): SELF-MEASUREMENT & SELF-MANAGEMENT OF IP NETWORKS & SERVICES, 2004, : 99 - 103
  • [8] Spam Tags Detection and Protection using Tags' Relationship Based Anti-Spam Approach
    Mahatab, Urooj
    Jabeen, Fouzia
    [J]. 2018 IEEE 21ST INTERNATIONAL MULTI-TOPIC CONFERENCE (INMIC), 2018,
  • [9] Improved Bayesian Anti-Spam Filter - Implementation and Analysis on Independent Spam Corpuses
    Issac, Biju
    Jap, Wendy Japutra
    Sutanto, Jofry Hadi
    [J]. 2009 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND TECHNOLOGY, VOL II, PROCEEDINGS, 2009, : 326 - 330
  • [10] A new anti-Spam filter based on data mining and analysis of email security
    Wu, Y
    Li, ZJ
    Luo, P
    Wang, GY
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY: TOOLS AND TECHNOLOGY V, 2003, 5098 : 147 - 154