A Local-Concentration-Based Feature Extraction Approach for Spam Filtering

被引:35
|
作者
Zhu, Yuanchun [1 ,2 ]
Tan, Ying [1 ,2 ]
机构
[1] Peking Univ, Sch Elect Engn & Comp Sci, Key Lab Machine Percept, Minist Educ, Beijing 100871, Peoples R China
[2] Peking Univ, Sch Elect Engn & Comp Sci, Dept Machine Intelligence, Beijing 100871, Peoples R China
基金
国家高技术研究发展计划(863计划); 中国国家自然科学基金;
关键词
Artificial immune system (AIS); bag-of-words (BoW); feature extraction; global concentration (GC); local concentration (LC); spam filtering;
D O I
10.1109/TIFS.2010.2103060
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Inspired from the biological immune system, we propose a local concentration (LC)-based feature extraction approach for anti-spam. The LC approach is considered to be able to effectively extract position-correlated information from messages by transforming each area of a message to a corresponding LC feature. Two implementation strategies of the LC approach are designed using a fixed-length sliding window and a variable-length sliding window. To incorporate the LC approach into the whole process of spam filtering, a generic LC model is designed. In the LC model, two types of detector sets are at first generated by using term selection methods and a well-defined tendency threshold. Then a sliding window is adopted to divide the message into individual areas. After segmentation of the message, the concentration of detectors is calculated and taken as the feature for each local area. Finally, all the features of local areas are combined as a feature vector of the message. To evaluate the proposed LC model, several experiments are conducted on five benchmark corpora using the cross-validation method. It is shown that the LC approach cooperates well with three term selection methods, which endows it with flexible applicability in the real world. Compared to the global-concentration-based approach and the prevalent bag-of-words approach, the LC approach has better performance in terms of both accuracy and measure. It is also demonstrated that the LC approach is robust against messages with variable message length.
引用
收藏
页码:486 / 497
页数:12
相关论文
共 50 条
  • [1] A Multi-Resolution-Concentration Based Feature Construction Approach for Spam Filtering
    Mi, Guyue
    Zhang, Pengtao
    Tan, Ying
    2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [2] Concentration Based Feature Construction Approach for Spam Detection
    Tan, Ying
    Deng, Chao
    Ruan, Guangchen
    IJCNN: 2009 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1- 6, 2009, : 510 - 515
  • [3] A Hybrid Approach for Spam Filtering using Local concentration based K-means Clustering
    Jain, Kunal
    Agrawal, Sanjay
    2014 5TH INTERNATIONAL CONFERENCE CONFLUENCE THE NEXT GENERATION INFORMATION TECHNOLOGY SUMMIT (CONFLUENCE), 2014, : 194 - 199
  • [4] The Impact of Feature Extraction and Selection on SMS Spam Filtering
    Uysal, A. K.
    Gunal, S.
    Ergin, S.
    Gunal, E. Sora
    ELEKTRONIKA IR ELEKTROTECHNIKA, 2013, 19 (05) : 67 - 72
  • [5] An Approach to Image Spam Filtering Based on Base64 Encoding and N-Gram Feature Extraction
    Xu, Congfu
    Chen, Yafang
    Chiew, Kevin
    22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 1, 2010,
  • [6] Feature selection for spam filtering
    Menghour, Kamilia
    Souici-Meslati, Labiba
    CORIA 2010: Actes de la COnference en Recherche d'Information et Applications - Proceedings of the Conference on Information Retrieval and Applications, 2010, : 349 - 360
  • [7] An Artificial Immune System with Local Feature Selection classifier for Spam Filtering
    Kalbhor, Mayank
    Shrivastava, Shailendra
    Ujjainiya, Babita
    2013 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATIONS AND NETWORKING TECHNOLOGIES (ICCCNT), 2013,
  • [8] Local feature extraction in fingerprints by complex filtering
    Ronthaler, H
    Kollreider, K
    Bigun, J
    ADVANCES IN BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2005, 3781 : 77 - 84
  • [9] A novel feature extraction approach in SMS spam filtering for mobile communication: one-dimensional ternary patterns
    Kaya, Yilmaz
    Ertugrul, Omer Faruk
    SECURITY AND COMMUNICATION NETWORKS, 2016, 9 (17) : 4680 - 4690
  • [10] Improved spam e-mail filtering based on committee machines and information theoretic feature extraction
    Zorkadis, V
    Panayotou, M
    Karras, DA
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 179 - 184