Legitimate and spam SMS classification employing novel Ensemble feature selection algorithm

被引:0
|
作者
Shailender Kumar
Shweta Gupta
机构
[1] Delhi Technological University,Department of Computer Science and Engineering
来源
关键词
Spam; Feature selection; Machine learning; Ensemble model; Optimal feature subset;
D O I
暂无
中图分类号
学科分类号
摘要
The classification of Short Message Service (SMS) is a crucial task that requires distinguishing undesirable and harmful messages from legitimate ones. These spam messages may include malicious links that reveal users' personal data to some sharp practices. Many Machine Learning (ML) and deep learning algorithms prevail for spam detection. However, a large number of features present in the message drop the classifier's performance, which can be untangled through feature engineering. This paper proposes an ensemble feature selection algorithm assessed over a high-dimensional imbalanced dataset that resembles the actual data prevailing in the surroundings. The proposed work employs wrapper-based Stratified K-fold Cross-Validated (SKCV) Recursive Feature Elimination with Cross-Validation (RFECV) over hyperparameters tuned Support Vector Classifier (SVC) and random forest algorithms. This study contributes an efficient feature selection algorithm that produces a pruned feature subset substituting the whole dataset. To establish the potential of the proposed approach, its performance is evaluated against five feature selection techniques, including Mutual Information (MI), Low Variance (LV), Particle Swarm Optimization (PSO), Chi-square, and Recursive Feature Elimination (RFE). All obtained feature subsets are analyzed with three ML models Logistic Regression (LR), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGB). This research also compares the proposed framework results with Long Short Term Memory (LSTM) neural network to validate its competence.
引用
收藏
页码:19897 / 19927
页数:30
相关论文
共 50 条
  • [41] Applying effective feature selection techniques with hierarchical mixtures of experts for spam classification
    Belsis, Petros
    Fragos, Kostas
    Gritzalis, Stefanos
    Skourlas, Christos
    [J]. JOURNAL OF COMPUTER SECURITY, 2008, 16 (06) : 761 - 790
  • [42] Evolutionary computing for clinical dataset classification using a novel feature selection algorithm
    Sheth, Pranali D.
    Patil, Shrishailappa T.
    Dhore, Manikrao L.
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (08) : 5075 - 5082
  • [43] Improving Feature Selection for Credit Scoring Classification Using a Novel Hybrid Algorithm
    Qasim, Omar Saber
    Algamal, Zakariya Yahya
    [J]. THAILAND STATISTICIAN, 2021, 19 (03): : 593 - 605
  • [44] A Relative Feature Selection Algorithm for Graph Classification
    Keneshloo, Yaser
    Yazdani, Sasan
    [J]. ADVANCES IN DATABASES AND INFORMATION SYSTEMS, 2013, 186 : 137 - 148
  • [45] An Improved Firefly Algorithm for Feature Selection in Classification
    Xu, Huali
    Yu, Shuhao
    Chen, Jiajun
    Zuo, Xukun
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2018, 102 (04) : 2823 - 2834
  • [46] An Improved Feature Selection Algorithm for Ordinal Classification
    Pan, Weiwei
    Hu, Qinhua
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2016, E99A (12): : 2266 - 2274
  • [47] A Projected Feature Selection Algorithm for Data Classification
    Yin, Zhiwu
    Huang, Shangteng
    [J]. 2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15, 2007, : 3665 - 3668
  • [48] An Improved Firefly Algorithm for Feature Selection in Classification
    Huali Xu
    Shuhao Yu
    Jiajun Chen
    Xukun Zuo
    [J]. Wireless Personal Communications, 2018, 102 : 2823 - 2834
  • [49] ADAPTIVE BINARY FLOWER POLLINATION ALGORITHM FOR FEATURE SELECTION IN REVIEW SPAM DETECTION
    Rajamohana, S. P.
    Umamaheswari, K.
    Abirami, B.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN GREEN ENERGY AND HEALTHCARE TECHNOLOGIES (IGEHT), 2017,
  • [50] Feature Selection with Binary Symbiotic Organisms Search Algorithm for Email Spam Detection
    Mohammadzadeh, Hekmat
    Gharehchopogh, Farhad Soleimanian
    [J]. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2021, 20 (01) : 469 - 515