Legitimate and spam SMS classification employing novel Ensemble feature selection algorithm

被引:0
|
作者
Shailender Kumar
Shweta Gupta
机构
[1] Delhi Technological University,Department of Computer Science and Engineering
来源
关键词
Spam; Feature selection; Machine learning; Ensemble model; Optimal feature subset;
D O I
暂无
中图分类号
学科分类号
摘要
The classification of Short Message Service (SMS) is a crucial task that requires distinguishing undesirable and harmful messages from legitimate ones. These spam messages may include malicious links that reveal users' personal data to some sharp practices. Many Machine Learning (ML) and deep learning algorithms prevail for spam detection. However, a large number of features present in the message drop the classifier's performance, which can be untangled through feature engineering. This paper proposes an ensemble feature selection algorithm assessed over a high-dimensional imbalanced dataset that resembles the actual data prevailing in the surroundings. The proposed work employs wrapper-based Stratified K-fold Cross-Validated (SKCV) Recursive Feature Elimination with Cross-Validation (RFECV) over hyperparameters tuned Support Vector Classifier (SVC) and random forest algorithms. This study contributes an efficient feature selection algorithm that produces a pruned feature subset substituting the whole dataset. To establish the potential of the proposed approach, its performance is evaluated against five feature selection techniques, including Mutual Information (MI), Low Variance (LV), Particle Swarm Optimization (PSO), Chi-square, and Recursive Feature Elimination (RFE). All obtained feature subsets are analyzed with three ML models Logistic Regression (LR), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGB). This research also compares the proposed framework results with Long Short Term Memory (LSTM) neural network to validate its competence.
引用
收藏
页码:19897 / 19927
页数:30
相关论文
共 50 条
  • [1] Legitimate and spam SMS classification employing novel Ensemble feature selection algorithm
    Kumar, Shailender
    Gupta, Shweta
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (07) : 19897 - 19927
  • [2] XAI-based Feature Selection for SMS Spam Classification in Dravidian Languages
    Thirumalai, K. G.
    Prakash, K. Sakthi
    Abirami, A. M.
    Ramanujam, E.
    Sumitra, S.
    [J]. 2024 5TH INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN INFORMATION TECHNOLOGY, ICITIIT 2024, 2024,
  • [3] The Impact of Feature Extraction and Selection on SMS Spam Filtering
    Uysal, A. K.
    Gunal, S.
    Ergin, S.
    Gunal, E. Sora
    [J]. ELEKTRONIKA IR ELEKTROTECHNIKA, 2013, 19 (05) : 67 - 72
  • [4] Research on the ensemble learning classification algorithm based on the novel feature selection method
    Yao Ming-hai
    Wang Na
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON VEHICULAR ELECTRONICS AND SAFETY (ICVES), 2013, : 263 - 267
  • [5] Ensemble Feature Selection for Android SMS Malware Detection
    Ibrahim, Syed F.
    Hossain, Md Sakir
    Islam, Md Moontasirul
    Mostofa, Md Golam
    [J]. ADVANCES IN CYBERSECURITY, CYBERCRIMES, AND SMART EMERGING TECHNOLOGIES, 2023, 4 : 15 - 26
  • [6] A random forest algorithm under the ensemble approach for feature selection and classification
    Kharwar, Ankit
    Thakor, Devendra
    [J]. INTERNATIONAL JOURNAL OF COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS, 2023, 29 (04) : 426 - 447
  • [7] Efficient Classification of DDoS Attacks Using an Ensemble Feature Selection Algorithm
    Singh, Khundrakpam Johnson
    De, Tanmay
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2020, 29 (01) : 71 - 83
  • [8] A Novel Hybrid Feature Selection Algorithm for Hierarchical Classification
    Lima, Helen C. S. C.
    Otero, Fernando E. B.
    Merschmann, Luiz H. C.
    Souza, Marcone J. F.
    [J]. IEEE ACCESS, 2021, 9 : 127278 - 127292
  • [9] Intelligent churn prediction in telecom: employing mRMR feature selection and RotBoost based ensemble classification
    Adnan Idris
    Asifullah Khan
    Yeon Soo Lee
    [J]. Applied Intelligence, 2013, 39 : 659 - 672
  • [10] Intelligent churn prediction in telecom: employing mRMR feature selection and RotBoost based ensemble classification
    Idris, Adnan
    Khan, Asifullah
    Lee, Yeon Soo
    [J]. APPLIED INTELLIGENCE, 2013, 39 (03) : 659 - 672