The Impact of Feature Extraction and Selection on SMS Spam Filtering

被引:19
|
作者
Uysal, A. K. [1 ]
Gunal, S. [1 ]
Ergin, S. [2 ]
Gunal, E. Sora [2 ]
机构
[1] Anadolu Univ, Dept Comp Engn, Eskisehir, Turkey
[2] Eskisehir Osmangazi Univ, Dept Elect & Elect Engn, Eskisehir, Turkey
关键词
Feature extraction; feature selection; SMS; spam filter; TEXT; ALGORITHM;
D O I
10.5755/j01.eee.19.5.1829
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper investigates the impact of several feature extraction and feature selection approaches on filtering of short message service (SMS) spam messages in two different languages, namely Turkish and English. The entire feature set of filtering framework consists of the features originated from the bag-of-words (BoW) model along with the ensemble of structural features (SF) specific to spam problem. The distinctive BoW features are identified using information theoretic feature selection methods. Various combinations of the BoW and SF are then fed into widely used pattern classification algorithms to classify SMS messages. The filtering framework is evaluated on both Turkish and English SMS message datasets. For this purpose, as part of the study, the first publicly available Turkish SMS message collection is constituted as well. Comprehensive experimental analysis on the respective datasets revealed that the combinations of BoW and SFs, rather than BoW features alone, provide better classification performance on both datasets. Effectiveness of the utilized feature selection methods however slightly differs in each language.
引用
收藏
页码:67 / 72
页数:6
相关论文
共 50 条
  • [1] Feature selection for spam filtering
    Menghour, Kamilia
    Souici-Meslati, Labiba
    [J]. CORIA 2010: Actes de la COnference en Recherche d'Information et Applications - Proceedings of the Conference on Information Retrieval and Applications, 2010, : 349 - 360
  • [2] The Impact of Deep Learning Techniques on SMS Spam Filtering
    Gomaa, Wael Hassan
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (01) : 544 - 549
  • [3] A weighted feature enhanced Hidden Markov Model for spam SMS filtering
    Xia, Tian
    Chen, Xuemin
    [J]. NEUROCOMPUTING, 2021, 444 : 48 - 58
  • [4] A novel feature extraction approach in SMS spam filtering for mobile communication: one-dimensional ternary patterns
    Kaya, Yilmaz
    Ertugrul, Omer Faruk
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2016, 9 (17) : 4680 - 4690
  • [5] Efficient feature selection methods in chinese spam filtering
    Xu, Yan
    [J]. Information Technology Journal, 2013, 12 (20) : 5492 - 5496
  • [6] SMS spam filtering: Methods and data
    Delany, Sarah Jane
    Buckley, Mark
    Greene, Derek
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (10) : 9899 - 9908
  • [7] Spam Filtering Based on Improved CHI Feature Selection Method
    Lu, Zhimao
    Yu, Hongxia
    Fan, Dongmei
    Yuan, Chaoyue
    [J]. PROCEEDINGS OF THE 2009 CHINESE CONFERENCE ON PATTERN RECOGNITION AND THE FIRST CJK JOINT WORKSHOP ON PATTERN RECOGNITION, VOLS 1 AND 2, 2009, : 771 - 773
  • [8] Combining SVM with Orthogonal Centroid Feature Selection for Spam Filtering
    Zhou, Hong-liang
    Luo, Chang-yong
    [J]. INTERNATIONAL CONFERENCE ON COMPUTER, NETWORK SECURITY AND COMMUNICATION ENGINEERING (CNSCE 2014), 2014, : 290 - 297
  • [9] Legitimate and spam SMS classification employing novel Ensemble feature selection algorithm
    Kumar, Shailender
    Gupta, Shweta
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (07) : 19897 - 19927
  • [10] XAI-based Feature Selection for SMS Spam Classification in Dravidian Languages
    Thirumalai, K. G.
    Prakash, K. Sakthi
    Abirami, A. M.
    Ramanujam, E.
    Sumitra, S.
    [J]. 2024 5TH INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN INFORMATION TECHNOLOGY, ICITIIT 2024, 2024,