Performance Evaluation and Comparative Analysis of Machine Learning Models on the UNSW-NB15 Dataset: A Contemporary Approach to Cyber Threat Detection

被引：1

作者：

Fathima, Afrah ^{[1
]}

Khan, Amir ^{[1
]}

Uddin, Md Faizan ^{[2
]}

Waris, Mohammad Maqbool ^{[3
]}

Ahmad, Sultan ^{[4
]}

Sanin, Cesar ^{[5
]}

Szczerbicki, Edward ^{[6
]}

机构：

[1] Maulana Azad Natl Urdu Univ, Dept CS & IT, Hyderabad, India

[2] Chaitanya Bharathi Inst Technol, Dept AI & Data Sci, Hyderabad, India

[3] Adama Sci & Technol Univ, Dept Mech Engn, Adama, Ethiopia

[4] Prince Sattam Bin Abdulaziz Univ, Coll Comp Engn & Sci, Dept Comp Sci, Alkharj, Saudi Arabia

[5] Univ Newcastle, Dept Mech Engn, Newcastle, NSW, Australia

[6] Gdansk Univ Technol, Fac Management & Econ, Gdansk, Poland

来源：

CYBERNETICS AND SYSTEMS | 2023年

关键词：

K-nearest neighbours model; cyber threat detection; decision tree; gradient boosting; logistic regression; machine learning models; random forest; support vector machine;

D O I：

10.1080/01969722.2023.2296246

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This research work utilizes the University of New South Wales Network Based 2015 (UNSW-NB15) dataset to investigate the dynamic nature of cyber threats, departing from the obsolete Knowledge Discovery and Data Mining competition 1999 (KDD Cup99) dataset. The data preparation pipeline consists of essential procedures aimed at ensuring the integrity and appropriateness of the data for analysis. The method begins by removing null values, thereafter, applying one-hot encoding to categorical features, min-max scaling for data normalization, and label encoding for efficient management of binary labels. The process of feature selection is conducted utilizing the Pearson coefficient correlation. An exhaustive evaluation is conducted on six machine learning models for the purpose of binary classification. The evaluation takes into account key performance measures like accuracy, precision, recall, and F1 score. The Random Forest model demonstrated exceptional performance, with a remarkable accuracy of 99% and a robust F1 score of 98%. Additionally, it exhibited a well-balanced precision and recall at 98%. The Support Vector Machine, Gradient Boosting, Logistic Regression, Decision Tree, and K-Nearest Neighbors models exhibit notable performance, achieving accuracy and F1 scores around at the 98% level. During our investigation into multi-class classification research, we thoroughly examined numerous machine learning models, all of which exhibited robust performance, with accuracy rates ranging from 97% to 98%. The aforementioned results highlight the efficacy of these models in accurately classifying data, regularly achieving high levels of precision, recall, and F1 scores for positive case predictions. This study offers a current viewpoint on the identification of cyber threats and emphasizes the appropriateness of several machine learning models in this rapidly changing field.

引用

页数：17

共 50 条

[41] Comparative Evaluation and Comprehensive Analysis of Machine Learning Models for Regression Problems
Boran Sekeroglu
Yoney Kirsal Ever
Kamil Dimililer
Fadi Al-Turjman
[J]. Data Intelligence, 2022, 4 (03) : 620 - 652
[42] A comparative analysis of various machine learning methods for anomaly detection in cyber attacks on IoT networks
Inuwa, Muhammad Muhammad
Das, Resul
[J]. INTERNET OF THINGS, 2024, 26
[43] A comparative study of a combinatorial machine learning approach to face detection using a very small training dataset
Oyarzo Huichaqueo, Marco
Magdaleno Maltas, Jordi
[J]. 2021 IEEE CHILEAN CONFERENCE ON ELECTRICAL, ELECTRONICS ENGINEERING, INFORMATION AND COMMUNICATION TECHNOLOGIES (IEEE CHILECON 2021), 2021, : 709 - 715
[44] Increasing the performance of intrusion detection models developed using machine learning method with preprocessing applied to the dataset
Ilgun, Esen Gul
Samet, Refik
[J]. JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2024, 39 (02): : 679 - 692
[45] Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning models
Dube, Lindani
Verster, Tanja
[J]. DATA SCIENCE IN FINANCE AND ECONOMICS, 2023, 3 (04): : 354 - 379
[46] Machine learning classifier-based detection of cyber-attack on power system: Comparative analysis
Saini, Rahul
Verma, Yajvender Pal
Saluja, Krishan Kumar
[J]. 2022 22ND NATIONAL POWER SYSTEMS CONFERENCE, NPSC, 2022,
[47] Secure Cyber Defense: An Analysis of Network Intrusion-Based Dataset CCD-IDSv1 with Machine Learning and Deep Learning Models
Thapa, Niraj
Liu, Zhipeng
Shaver, Addison
Esterline, Albert
Gokaraju, Balakrishna
Roy, Kaushik
[J]. ELECTRONICS, 2021, 10 (15)
[48] Performance Evaluation of Parametric and Non-Parametric Machine Learning Models using Statistical Analysis for RT-IoT2022 Dataset
Sharmila, B. S.
Nandini, B. M.
Kavitha, S. S.
Srivatsa, Anand
[J]. JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2024, 83 (08): : 864 - 872
[49] Performance Evaluation of Machine Learning Models for Multi-class Lung Cancer Detection
Kumar, M. Prema
Ram, G. Challa
Ravuri, Viswanadham
Subbarao, M. Venkata
Rahaman, Abdul S. K.
Nandan, T. P. K.
[J]. 2024 4TH INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND SOCIAL NETWORKING, ICPCSN 2024, 2024, : 414 - 418
[50] Trust in Intrusion Detection Systems: An Investigation of Performance Analysis for Machine Learning and Deep Learning Models
Mahbooba, Basim
Sahal, Radhya
Alosaimi, Wael
Serrano, Martin
[J]. COMPLEXITY, 2021, 2021

← 1 2 3 4 5 →