Two-step based hybrid feature selection method for spam filtering

被引:9
|
作者
Wang, Youwei [1 ]
Liu, Yuanning [1 ]
Zhu, Xiaodong [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130023, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature selection; spam filtering; particle swarm optimization; convergence rate; Support Vector Machine; Naive Bayesian; CLASSIFICATION; ALGORITHM;
D O I
10.3233/IFS-141240
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection, which can reduce the dimensionality of vector space without sacrificing the performance of the classifier, is commonly used in spam filtering. As many classifiers cannot deal with the features with large dimensions, the noisy, irrelevant and redundant data should be removed from the feature spaces. In this paper, a two-step based hybrid feature selection method, called TFSM, is proposed. Firstly, we select the most discriminative features by an existing document frequency based feature selection method (called ODFFS). Secondly, we select the remaining features by combining the ODFFS and a newly proposed term frequency based feature selection method (called NTFFS). Moreover, we propose a new optimizing meta-heuristic method, called GOPSO, to improve the convergence rate of standard particle swarm optimization. In the experiments, Support Vector Machine (SVM) and Naive Bayesian (NB) classifiers are used on four corpuses: PU2, PU3, Enron-spam and Trec2007. The experimental results show that, TFSM is significantly superior to information gain, comprehensively measure feature selection, t-test based feature selection, term frequency based information gain and improved term frequency inverse document frequency method on four corpuses when SVM and NB are applied respectively.
引用
收藏
页码:2785 / 2796
页数:12
相关论文
共 50 条
  • [1] Two-step based feature selection method for filtering redundant information
    Wang, Youwei
    Feng, Lizhou
    Li, Yang
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2017, 33 (04) : 2059 - 2073
  • [2] Two-Step Spam Message Filtering Method Based on Optimal Segmentation Strategy
    Wan Difei
    Chen Jieshu
    PROCEEDINGS OF 2009 CONFERENCE ON COMMUNICATION FACULTY, 2009, : 87 - 92
  • [3] Term frequency combined hybrid feature selection method for spam filtering
    Liu, Yuanning
    Wang, Youwei
    Feng, Lizhou
    Zhu, Xiaodong
    PATTERN ANALYSIS AND APPLICATIONS, 2016, 19 (02) : 369 - 383
  • [4] Term frequency combined hybrid feature selection method for spam filtering
    Yuanning Liu
    Youwei Wang
    Lizhou Feng
    Xiaodong Zhu
    Pattern Analysis and Applications, 2016, 19 : 369 - 383
  • [5] Spam Filtering Based on Improved CHI Feature Selection Method
    Lu, Zhimao
    Yu, Hongxia
    Fan, Dongmei
    Yuan, Chaoyue
    PROCEEDINGS OF THE 2009 CHINESE CONFERENCE ON PATTERN RECOGNITION AND THE FIRST CJK JOINT WORKSHOP ON PATTERN RECOGNITION, VOLS 1 AND 2, 2009, : 771 - 773
  • [6] A new semantic-based feature selection method for spam filtering
    Mendez, Jose R.
    Cotos-Yanez, Tomas R.
    Ruano-Ordas, David
    APPLIED SOFT COMPUTING, 2019, 76 : 89 - 104
  • [7] Feature Selection and Similarity Coefficient Based Method for Email Spam Filtering
    Abdelrahim, Ali Ahmed A.
    Elhadi, Ammar Ahmed E.
    Ibrahim, Hamza
    Elmisbah, Naser
    2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONICS ENGINEERING (ICCEEE), 2013, : 630 - 633
  • [8] Feature selection for spam filtering
    Menghour, Kamilia
    Souici-Meslati, Labiba
    CORIA 2010: Actes de la COnference en Recherche d'Information et Applications - Proceedings of the Conference on Information Retrieval and Applications, 2010, : 349 - 360
  • [9] Feature selection and ensemble construction: A two-step method for aspect based sentiment analysis
    Akhtar, Md Shad
    Gupta, Deepak
    Ekbal, Asif
    Bhattacharyya, Pushpak
    KNOWLEDGE-BASED SYSTEMS, 2017, 125 : 116 - 135
  • [10] An evidential classifier based on feature selection and two-step classification strategy
    Lian, Chunfeng
    Ruan, Su
    Denoeux, Thierry
    PATTERN RECOGNITION, 2015, 48 (07) : 2318 - 2327