A Machine Learning-Based Framework with Enhanced Feature Selection and Resampling for Improved Intrusion Detection

被引：0

作者：

Malik, Fazila ^{[1
]}

Khan, Qazi Waqas ^{[2
]}

Rizwan, Atif ^{[2
]}

Alnashwan, Rana ^{[3
]}

Atteia, Ghada ^{[3
]}

机构：

[1] Iqra Univ Islamabad, Dept Comp Sci, Islamabad 44000, Pakistan

[2] Jeju Natl Univ, Dept Comp Engn, Jejusi 63243, South Korea

[3] Princess Nourah bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Informat Technol, POB 84428, Riyadh 11671, Saudi Arabia

来源：

MATHEMATICS | 2024年 / 12卷 / 12期

关键词：

feature selection; data resampling; intrusion detection; applied machine learning; deep learning; INTERNET;

D O I：

10.3390/math12121799

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Intrusion Detection Systems (IDSs) play a crucial role in safeguarding network infrastructures from cyber threats and ensuring the integrity of highly sensitive data. Conventional IDS technologies, although successful in achieving high levels of accuracy, frequently encounter substantial model bias. This bias is primarily caused by imbalances in the data and the lack of relevance of certain features. This study aims to tackle these challenges by proposing an advanced machine learning (ML) based IDS that minimizes misclassification errors and corrects model bias. As a result, the predictive accuracy and generalizability of the IDS are significantly improved. The proposed system employs advanced feature selection techniques, such as Recursive Feature Elimination (RFE), sequential feature selection (SFS), and statistical feature selection, to refine the input feature set and minimize the impact of non-predictive attributes. In addition, this work incorporates data resampling methods such as Synthetic Minority Oversampling Technique and Edited Nearest Neighbor (SMOTE_ENN), Adaptive Synthetic Sampling (ADASYN), and Synthetic Minority Oversampling Technique-Tomek Links (SMOTE_Tomek) to address class imbalance and improve the accuracy of the model. The experimental results indicate that our proposed model, especially when utilizing the random forest (RF) algorithm, surpasses existing models regarding accuracy, precision, recall, and F Score across different data resampling methods. Using the ADASYN resampling method, the RF model achieves an accuracy of 99.9985% for botnet attacks and 99.9777% for Man-in-the-Middle (MITM) attacks, demonstrating the effectiveness of our approach in dealing with imbalanced data distributions. This research not only improves the abilities of IDS to identify botnet and MITM attacks but also provides a scalable and efficient solution that can be used in other areas where data imbalance is a recurring problem. This work has implications beyond IDS, offering valuable insights into using ML techniques in complex real-world scenarios.

引用

页数：25

共 50 条

[1] Machine learning-based intrusion detection: feature selection versus feature extraction
Ngo, Vu-Duc
Vuong, Tuan-Cuong
Van Luong, Thien
Tran, Hung
[J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (03): : 2365 - 2379
[2] INTRUSION DETECTION BASED ON MACHINE LEARNING AND FEATURE SELECTION
Alaoui, Souad
El Gonnouni, Amina
Lyhyaoui, Abdelouahid
[J]. MENDEL 2011 - 17TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING, 2011, : 199 - 206
[3] An Improved Machine Learning-Based Employees Attrition Prediction Framework with Emphasis on Feature Selection
Najafi-Zangeneh, Saeed
Shams-Gharneh, Naser
Arjomandi-Nezhad, Ali
Zolfani, Sarfaraz Hashemkhani
[J]. MATHEMATICS, 2021, 9 (11)
[4] Feature extraction for machine learning-based intrusion detection in IoT networks
Mohanad Sarhan
Siamak Layeghy
Nour Moustafa
Marcus Gallagher
Marius Portmann
[J]. Digital Communications and Networks, 2024, 10 (01) : 205 - 216
[5] Feature extraction for machine learning-based intrusion detection in IoT networks
Sarhan, Mohanad
Layeghy, Siamak
Moustafa, Nour
Gallagher, Marcus
Portmann, Marius
[J]. DIGITAL COMMUNICATIONS AND NETWORKS, 2024, 10 (01) : 205 - 216
[6] Enhancing intrusion detection in IoT networks using machine learning-based feature selection and ensemble models
Almotairi, Ayoob
Atawneh, Samer
Khashan, Osama A.
Khafajah, Nour M.
[J]. SYSTEMS SCIENCE & CONTROL ENGINEERING, 2024, 12 (01)
[7] Automatic Feature Extraction and Selection For Machine Learning Based Intrusion Detection
Liu, Jinjie
Chung, Sun Sunnie
[J]. 2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 1400 - 1405
[8] A new hybrid ensemble feature selection framework for machine learning-based phishing detection system
Chiew, Kang Leng
Tan, Choon Lin
Wong, KokSheik
Yong, Kelvin S. C.
Tiong, Wei King
[J]. INFORMATION SCIENCES, 2019, 484 : 153 - 166
[9] Feature Engineering in Machine Learning-Based Intrusion Detection Systems for OT Networks
Howe, Alex
Papa, Mauricio
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING, SMARTCOMP, 2023, : 361 - 366
[10] A Deep Learning-Based Framework for Feature Extraction and Classification of Intrusion Detection in Networks
Naveed, Muhammad
Arif, Fahim
Usman, Syed Muhammad
Anwar, Aamir
Hadjouni, Myriam
Elmannai, Hela
Hussain, Saddam
Ullah, Syed Sajid
Umar, Fazlullah
[J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022

← 1 2 3 4 5 →