Two-step based hybrid feature selection method for spam filtering

被引:9
|
作者
Wang, Youwei [1 ]
Liu, Yuanning [1 ]
Zhu, Xiaodong [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130023, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature selection; spam filtering; particle swarm optimization; convergence rate; Support Vector Machine; Naive Bayesian; CLASSIFICATION; ALGORITHM;
D O I
10.3233/IFS-141240
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection, which can reduce the dimensionality of vector space without sacrificing the performance of the classifier, is commonly used in spam filtering. As many classifiers cannot deal with the features with large dimensions, the noisy, irrelevant and redundant data should be removed from the feature spaces. In this paper, a two-step based hybrid feature selection method, called TFSM, is proposed. Firstly, we select the most discriminative features by an existing document frequency based feature selection method (called ODFFS). Secondly, we select the remaining features by combining the ODFFS and a newly proposed term frequency based feature selection method (called NTFFS). Moreover, we propose a new optimizing meta-heuristic method, called GOPSO, to improve the convergence rate of standard particle swarm optimization. In the experiments, Support Vector Machine (SVM) and Naive Bayesian (NB) classifiers are used on four corpuses: PU2, PU3, Enron-spam and Trec2007. The experimental results show that, TFSM is significantly superior to information gain, comprehensively measure feature selection, t-test based feature selection, term frequency based information gain and improved term frequency inverse document frequency method on four corpuses when SVM and NB are applied respectively.
引用
收藏
页码:2785 / 2796
页数:12
相关论文
共 50 条
  • [41] FEATURE SELECTION USING PARTICLE SWARM OPTIMIZATION WITH APPLICATION IN SPAM FILTERING
    Lai, Chih-Chin
    Wu, Chih-Hung
    Tsai, Ming-Chi
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2009, 5 (02): : 423 - 432
  • [42] Two-step filtering datamining method integrating case-based reasoning and rule induction
    Park, Yoon-Joo
    Choi, Enmi
    Park, Soo-Hyun
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (01) : 861 - 871
  • [43] An Artificial Immune System with Local Feature Selection classifier for Spam Filtering
    Kalbhor, Mayank
    Shrivastava, Shailendra
    Ujjainiya, Babita
    2013 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATIONS AND NETWORKING TECHNOLOGIES (ICCCNT), 2013,
  • [44] Dynamic feature selection for spam filtering using Support Vector Machine
    Islam, Md. Rafiqul
    Zhou, Wanlei
    Choudhury, Morshed U.
    6TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE, PROCEEDINGS, 2007, : 757 - +
  • [45] Feature Tracking by Two-Step Optimization
    Schnorr, Andrea
    Helmrich, Dirk N.
    Denker, Dominik
    Kuhlen, Torsten W.
    Hentschel, Bernd
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2020, 26 (06) : 2219 - 2233
  • [46] Unsupervised Feature Selection for Spherical Data Modeling: Application to Image-Based Spam Filtering
    Amayri, Ola
    Bouguila, Nizar
    MULTIMEDIA COMMUNICATIONS, SERVICES AND SECURITY, 2012, 287 : 13 - 23
  • [47] Study on the spam-filtering system based on feature selection mechanism and improved SVM classification
    Liu, Shouqiang
    Qi, Deyu
    Liu, Bo
    Pan, Chunhua
    Yang, Bo
    2006 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PTS 1 AND 2, PROCEEDINGS, 2006, : 1143 - 1146
  • [48] A two-step method for cusp catastrophe model construction based on the selection of important variables
    Zhang M.
    Fu D.-M.
    Cheng X.-Q.
    Yang B.-K.
    Hao W.-K.
    Chen Y.
    Shao L.-Z.
    Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2023, 45 (01): : 128 - 136
  • [49] Identifying Phage Virion Proteins by Using Two-Step Feature Selection Methods
    Tan, Jiu-Xin
    Dao, Fu-Ying
    Lv, Hao
    Feng, Peng-Mian
    Ding, Hui
    MOLECULES, 2018, 23 (08):
  • [50] Feature Based Stereo Matching Using Two-Step Expansion
    Wang, Liqiang
    Liu, Zhen
    Zhang, Zhonghua
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014