A Machine Learning-Based Framework with Enhanced Feature Selection and Resampling for Improved Intrusion Detection

被引:0
|
作者
Malik, Fazila [1 ]
Khan, Qazi Waqas [2 ]
Rizwan, Atif [2 ]
Alnashwan, Rana [3 ]
Atteia, Ghada [3 ]
机构
[1] Iqra Univ Islamabad, Dept Comp Sci, Islamabad 44000, Pakistan
[2] Jeju Natl Univ, Dept Comp Engn, Jejusi 63243, South Korea
[3] Princess Nourah bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Informat Technol, POB 84428, Riyadh 11671, Saudi Arabia
关键词
feature selection; data resampling; intrusion detection; applied machine learning; deep learning; INTERNET;
D O I
10.3390/math12121799
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Intrusion Detection Systems (IDSs) play a crucial role in safeguarding network infrastructures from cyber threats and ensuring the integrity of highly sensitive data. Conventional IDS technologies, although successful in achieving high levels of accuracy, frequently encounter substantial model bias. This bias is primarily caused by imbalances in the data and the lack of relevance of certain features. This study aims to tackle these challenges by proposing an advanced machine learning (ML) based IDS that minimizes misclassification errors and corrects model bias. As a result, the predictive accuracy and generalizability of the IDS are significantly improved. The proposed system employs advanced feature selection techniques, such as Recursive Feature Elimination (RFE), sequential feature selection (SFS), and statistical feature selection, to refine the input feature set and minimize the impact of non-predictive attributes. In addition, this work incorporates data resampling methods such as Synthetic Minority Oversampling Technique and Edited Nearest Neighbor (SMOTE_ENN), Adaptive Synthetic Sampling (ADASYN), and Synthetic Minority Oversampling Technique-Tomek Links (SMOTE_Tomek) to address class imbalance and improve the accuracy of the model. The experimental results indicate that our proposed model, especially when utilizing the random forest (RF) algorithm, surpasses existing models regarding accuracy, precision, recall, and F Score across different data resampling methods. Using the ADASYN resampling method, the RF model achieves an accuracy of 99.9985% for botnet attacks and 99.9777% for Man-in-the-Middle (MITM) attacks, demonstrating the effectiveness of our approach in dealing with imbalanced data distributions. This research not only improves the abilities of IDS to identify botnet and MITM attacks but also provides a scalable and efficient solution that can be used in other areas where data imbalance is a recurring problem. This work has implications beyond IDS, offering valuable insights into using ML techniques in complex real-world scenarios.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] Recursive Feature Elimination with Cross-Validation with Decision Tree: Feature Selection Method for Machine Learning-Based Intrusion Detection Systems
    Awad, Mohammed
    Fraihat, Salam
    JOURNAL OF SENSOR AND ACTUATOR NETWORKS, 2023, 12 (05)
  • [22] Feature engineering and deep learning-based intrusion detection framework for securing edge IoT
    Nasir, Muneeba
    Javed, Abdul Rehman
    Tariq, Muhammad Adnan
    Asim, Muhammad
    Baker, Thar
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (06): : 8852 - 8866
  • [23] MACHINE LEARNING-BASED ANDROID INTRUSION DETECTION SYSTEM
    Tahreem, Madiha
    Andleeb, Ifrah
    Hussain, Bilal Zahid
    Hameed, Arsalan
    arXiv,
  • [24] Weighted Feature Selection for Machine Learning Based Accurate Intrusion Detection in Communication Networks
    Tripathi, Gaurav
    Singh, Vishal Krishna
    Sharma, Varun
    Vinodbhai, Majithia Vivek
    IEEE ACCESS, 2024, 12 : 20973 - 20982
  • [25] An End-to-End Framework for Machine Learning-Based Network Intrusion Detection System
    De Carvalho Bertoli, Gustavo
    Pereira Junior, Lourenco Alves
    Saotome, Osamu
    Dos Santos, Aldri L.
    Verri, Filipe Alves Neto
    Marcondes, Cesar Augusto Cavalheiro
    Barbieri, Sidnei
    Rodrigues, Moises S.
    Parente De Oliveira, Jose M.
    IEEE ACCESS, 2021, 9 : 106790 - 106805
  • [26] Towards Effective Feature Selection in Machine Learning-Based Botnet Detection Approaches
    Beigi, Elaheh Biglar
    Jazi, Hossein Hadian
    Stakhanova, Natalia
    Ghorbani, Ali A.
    2014 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2014, : 247 - 255
  • [27] Reviewing various feature selection techniques in machine learning-based botnet detection
    Baruah, Sangita
    Borah, Dhruba Jyoti
    Deka, Vaskar
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (12):
  • [28] An Efficient Intrusion Detection Framework Based on Embedding Feature Selection and Ensemble Learning Technique
    Mokbal, Fawaz
    Dan, Wang
    Osman, Musa
    Ping, Yang
    Alsamhi, Saeed
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2022, 19 (02) : 237 - 248
  • [29] Machine Learning-Based Cardiovascular Disease Detection Using Optimal Feature Selection
    Ullah, Tahseen
    Ullah, Syed Irfan
    Ullah, Khalil
    Ishaq, Muhammad
    Khan, Ahmad
    Ghadi, Yazeed Yasin
    Algarni, Abdulmohsen
    IEEE ACCESS, 2024, 12 : 16431 - 16446
  • [30] Feature Selection For Machine Learning-Based Early Detection of Distributed Cyber Attacks
    Feng, Yaokai
    Akiyama, Hitoshi
    Lu, Liang
    Sakurai, Kouichi
    2018 16TH IEEE INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP, 16TH IEEE INT CONF ON PERVAS INTELLIGENCE AND COMP, 4TH IEEE INT CONF ON BIG DATA INTELLIGENCE AND COMP, 3RD IEEE CYBER SCI AND TECHNOL CONGRESS (DASC/PICOM/DATACOM/CYBERSCITECH), 2018, : 173 - 180