Two-step based hybrid feature selection method for spam filtering

被引:9
|
作者
Wang, Youwei [1 ]
Liu, Yuanning [1 ]
Zhu, Xiaodong [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130023, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature selection; spam filtering; particle swarm optimization; convergence rate; Support Vector Machine; Naive Bayesian; CLASSIFICATION; ALGORITHM;
D O I
10.3233/IFS-141240
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection, which can reduce the dimensionality of vector space without sacrificing the performance of the classifier, is commonly used in spam filtering. As many classifiers cannot deal with the features with large dimensions, the noisy, irrelevant and redundant data should be removed from the feature spaces. In this paper, a two-step based hybrid feature selection method, called TFSM, is proposed. Firstly, we select the most discriminative features by an existing document frequency based feature selection method (called ODFFS). Secondly, we select the remaining features by combining the ODFFS and a newly proposed term frequency based feature selection method (called NTFFS). Moreover, we propose a new optimizing meta-heuristic method, called GOPSO, to improve the convergence rate of standard particle swarm optimization. In the experiments, Support Vector Machine (SVM) and Naive Bayesian (NB) classifiers are used on four corpuses: PU2, PU3, Enron-spam and Trec2007. The experimental results show that, TFSM is significantly superior to information gain, comprehensively measure feature selection, t-test based feature selection, term frequency based information gain and improved term frequency inverse document frequency method on four corpuses when SVM and NB are applied respectively.
引用
收藏
页码:2785 / 2796
页数:12
相关论文
共 50 条
  • [21] Feature Selection on Cancer Classification by a Two-Step Clustering Algorithm
    Liao, Bo
    Lu, Yan
    Zhu, Wen
    Li, Renfa
    JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2011, 8 (09) : 1792 - 1797
  • [22] Improving RBF networks by a two-step feature selection approach
    Scherf, M
    Brauer, W
    PROGRESS IN CONNECTIONIST-BASED INFORMATION SYSTEMS, VOLS 1 AND 2, 1998, : 249 - 252
  • [23] A Two-step Feature Selection Algorithm Adapting to Intrusion Detection
    Xiao, Lizhong
    Liu, Yunxiang
    FIRST IITA INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, : 618 - 622
  • [24] A two-step feature selection method for monitoring tool wear and its application to the coroning process
    Yum, Juil
    Kim, Tae Hyung
    Kannatey-Asibu, Elijah, Jr.
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2013, 64 (9-12): : 1355 - 1364
  • [25] A two-step feature selection method for monitoring tool wear and its application to the coroning process
    Juil Yum
    Tae Hyung Kim
    Elijah Kannatey-Asibu Jr.
    The International Journal of Advanced Manufacturing Technology, 2013, 64 : 1355 - 1364
  • [26] Combining SVM with Orthogonal Centroid Feature Selection for Spam Filtering
    Zhou, Hong-liang
    Luo, Chang-yong
    INTERNATIONAL CONFERENCE ON COMPUTER, NETWORK SECURITY AND COMMUNICATION ENGINEERING (CNSCE 2014), 2014, : 290 - 297
  • [27] A two-step filtering-based iterative image reconstruction method for interior tomography
    Zhang, Hanming
    Li, Lei
    Yan, Bin
    Wang, Linyuan
    Cai, Ailong
    Hu, Guoen
    JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY, 2016, 24 (05) : 733 - 747
  • [28] FeSTwo, a two-step feature selection algorithm based on feature engineering and sampling for the chronological age regression problem
    Wei, Zhipeng
    Ding, Shiying
    Duan, Meiyu
    Liu, Shuai
    Huang, Lan
    Zhou, Fengfeng
    COMPUTERS IN BIOLOGY AND MEDICINE, 2020, 125
  • [29] Relaxing feature selection in spam filtering by using case-based reasoning systems
    Mendez, J. R.
    Fdez-Riverola, F.
    Glez-Pena, D.
    Diaz, F.
    Corchado, J. M.
    PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4874 : 53 - +
  • [30] Prediction of Cyclin Protein Using Two-Step Feature Selection Technique
    Sun, Jia-Nan
    Yang, Hua-Yi
    Yao, Jing
    DIng, Hui
    Han, Shu-Guang
    Wu, Cheng-Yan
    Tang, Hua
    Tang, Hua
    IEEE Access, 2020, 8 : 109535 - 109542