Performance Evaluation and Comparative Analysis of Machine Learning Models on the UNSW-NB15 Dataset: A Contemporary Approach to Cyber Threat Detection

被引：1

作者：

Fathima, Afrah ^{[1
]}

Khan, Amir ^{[1
]}

Uddin, Md Faizan ^{[2
]}

Waris, Mohammad Maqbool ^{[3
]}

Ahmad, Sultan ^{[4
]}

Sanin, Cesar ^{[5
]}

Szczerbicki, Edward ^{[6
]}

机构：

[1] Maulana Azad Natl Urdu Univ, Dept CS & IT, Hyderabad, India

[2] Chaitanya Bharathi Inst Technol, Dept AI & Data Sci, Hyderabad, India

[3] Adama Sci & Technol Univ, Dept Mech Engn, Adama, Ethiopia

[4] Prince Sattam Bin Abdulaziz Univ, Coll Comp Engn & Sci, Dept Comp Sci, Alkharj, Saudi Arabia

[5] Univ Newcastle, Dept Mech Engn, Newcastle, NSW, Australia

[6] Gdansk Univ Technol, Fac Management & Econ, Gdansk, Poland

来源：

CYBERNETICS AND SYSTEMS | 2023年

关键词：

K-nearest neighbours model; cyber threat detection; decision tree; gradient boosting; logistic regression; machine learning models; random forest; support vector machine;

D O I：

10.1080/01969722.2023.2296246

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This research work utilizes the University of New South Wales Network Based 2015 (UNSW-NB15) dataset to investigate the dynamic nature of cyber threats, departing from the obsolete Knowledge Discovery and Data Mining competition 1999 (KDD Cup99) dataset. The data preparation pipeline consists of essential procedures aimed at ensuring the integrity and appropriateness of the data for analysis. The method begins by removing null values, thereafter, applying one-hot encoding to categorical features, min-max scaling for data normalization, and label encoding for efficient management of binary labels. The process of feature selection is conducted utilizing the Pearson coefficient correlation. An exhaustive evaluation is conducted on six machine learning models for the purpose of binary classification. The evaluation takes into account key performance measures like accuracy, precision, recall, and F1 score. The Random Forest model demonstrated exceptional performance, with a remarkable accuracy of 99% and a robust F1 score of 98%. Additionally, it exhibited a well-balanced precision and recall at 98%. The Support Vector Machine, Gradient Boosting, Logistic Regression, Decision Tree, and K-Nearest Neighbors models exhibit notable performance, achieving accuracy and F1 scores around at the 98% level. During our investigation into multi-class classification research, we thoroughly examined numerous machine learning models, all of which exhibited robust performance, with accuracy rates ranging from 97% to 98%. The aforementioned results highlight the efficacy of these models in accurately classifying data, regularly achieving high levels of precision, recall, and F1 scores for positive case predictions. This study offers a current viewpoint on the identification of cyber threats and emphasizes the appropriateness of several machine learning models in this rapidly changing field.

引用

页数：17

共 50 条

[1] Using machine learning techniques to identify rare cyber-attacks on the UNSW-NB15 dataset
Bagui, Sikha
Kalaimannan, Ezhil
Bagui, Subhash
Nandi, Debarghya
Pinto, Anthony
[J]. SECURITY AND PRIVACY, 2019, 2 (06):
[2] Improving the Performance of Machine Learning-Based Network Intrusion Detection Systems on the UNSW-NB15 Dataset
Moualla, Soulaiman
Khorzom, Khaldoun
Jafar, Assef
[J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
[3] Comparison of Machine Learning-Based Intrusion Detection Systems Using UNSW-NB15 Dataset
Sambandam, Rakoth Kandan
Daniel, D.
Gokulapriya, R.
Vetriveeran, Divya
Jenefa, J.
Anuneshwar
[J]. ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 2, AITA 2023, 2024, 844 : 311 - 324
[4] SVM Based Network Intrusion Detection for the UNSW-NB15 Dataset
Jing, Dishan
Chen, Hai-Bao
[J]. 2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ASIC (ASICON), 2019,
[5] Performance Analysis of Intrusion Detection Systems Using a Feature Selection Method on the UNSW-NB15 Dataset
Kasongo, Sydney M.
Sun, Yanxia
[J]. JOURNAL OF BIG DATA, 2020, 7 (01)
[6] Comparative Analysis of Feed-Forward and RNN Models for Intrusion Detection in Data Network Security with UNSW-NB15 Dataset
Cavojsky, Matus
Bugar, Gabriel
Levicky, Dusan
[J]. 2023 33RD INTERNATIONAL CONFERENCE RADIOELEKTRONIKA, RADIOELEKTRONIKA, 2023,
[7] Performance Analysis of Intrusion Detection Systems Using a Feature Selection Method on the UNSW-NB15 Dataset
Sydney M. Kasongo
Yanxia Sun
[J]. Journal of Big Data, 7
[8] UNSW-NB15 computer security dataset: Analysis through visualization
Zoghi, Zeinab
Serpen, Gursel
[J]. SECURITY AND PRIVACY, 2024, 7 (01)
[9] SC-CVAR: Intrusion Detection Using Feature Selection and Machine Learning Techniques on UNSW-NB15 Dataset
Rosy, J. Vimal
Kumar, S. Britto Ramesh
[J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (04): : 691 - 699
[10] Classification of UNSW-NB15 dataset using Exploratory Data Analysis using Ensemble Learning
Sharma, Neha
Yadav, Narendra Singh
Sharma, Saurabh
[J]. EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 2021, 8 (29):

← 1 2 3 4 5 →