A Local-Concentration-Based Feature Extraction Approach for Spam Filtering

被引:35
|
作者
Zhu, Yuanchun [1 ,2 ]
Tan, Ying [1 ,2 ]
机构
[1] Peking Univ, Sch Elect Engn & Comp Sci, Key Lab Machine Percept, Minist Educ, Beijing 100871, Peoples R China
[2] Peking Univ, Sch Elect Engn & Comp Sci, Dept Machine Intelligence, Beijing 100871, Peoples R China
基金
国家高技术研究发展计划(863计划); 中国国家自然科学基金;
关键词
Artificial immune system (AIS); bag-of-words (BoW); feature extraction; global concentration (GC); local concentration (LC); spam filtering;
D O I
10.1109/TIFS.2010.2103060
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Inspired from the biological immune system, we propose a local concentration (LC)-based feature extraction approach for anti-spam. The LC approach is considered to be able to effectively extract position-correlated information from messages by transforming each area of a message to a corresponding LC feature. Two implementation strategies of the LC approach are designed using a fixed-length sliding window and a variable-length sliding window. To incorporate the LC approach into the whole process of spam filtering, a generic LC model is designed. In the LC model, two types of detector sets are at first generated by using term selection methods and a well-defined tendency threshold. Then a sliding window is adopted to divide the message into individual areas. After segmentation of the message, the concentration of detectors is calculated and taken as the feature for each local area. Finally, all the features of local areas are combined as a feature vector of the message. To evaluate the proposed LC model, several experiments are conducted on five benchmark corpora using the cross-validation method. It is shown that the LC approach cooperates well with three term selection methods, which endows it with flexible applicability in the real world. Compared to the global-concentration-based approach and the prevalent bag-of-words approach, the LC approach has better performance in terms of both accuracy and measure. It is also demonstrated that the LC approach is robust against messages with variable message length.
引用
收藏
页码:486 / 497
页数:12
相关论文
共 50 条
  • [21] An enhanced algorithm for semantic-based feature reduction in spam filtering
    Novo-Loures, Maria
    Pavon, Reyes
    Laza, Rosalia
    Mendez, Jose R.
    Ruano-Ordas, David
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [22] Efficient Feature Set for Spam Email Filtering
    Varghese, Reshma
    Dhanya, K. A.
    2017 7TH IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2017, : 732 - 737
  • [23] Unsupervised feature learning for spam email filtering
    Diale, Melvin
    Celik, Turgay
    Van Der Walt, Christiaan
    COMPUTERS & ELECTRICAL ENGINEERING, 2019, 74 : 89 - 104
  • [24] Artificial immunity-based feature extraction for spam detection
    Sirisanyalak, Burim
    Sornil, Ohm
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 3, PROCEEDINGS, 2007, : 359 - +
  • [25] Feature Extraction and Classification of Spam Emails
    Hassan, Muhammad Ali
    Mtetwa, Nhamo
    2018 5TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI), 2018, : 93 - 98
  • [26] Feature extraction for collaborative filtering: A genetic programming approach
    Anand, Deepa
    International Journal of Computer Science Issues, 2012, 9 (5 5-1): : 348 - 354
  • [27] Content-based Approach for Vietnamese Spam SMS Filtering
    Pham, Thai-Hoang
    Le-Hong, Phuong
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 41 - 44
  • [28] Using a probable weight based Bayesian approach for spam filtering
    Anayat, S
    Ali, A
    Ahmad, HF
    INMIC 2004: 8th International Multitopic Conference, Proceedings, 2004, : 340 - 345
  • [29] A new feature selection algorithm based on binomial hypothesis testing for spam filtering
    Yang, Jieming
    Liu, Yuanning
    Liu, Zhen
    Zhu, Xiaodong
    Zhang, Xiaoxu
    KNOWLEDGE-BASED SYSTEMS, 2011, 24 (06) : 904 - 914
  • [30] Two-step based hybrid feature selection method for spam filtering
    Wang, Youwei
    Liu, Yuanning
    Zhu, Xiaodong
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 27 (06) : 2785 - 2796