Performance Evaluation and Comparative Analysis of Machine Learning Models on the UNSW-NB15 Dataset: A Contemporary Approach to Cyber Threat Detection

被引:1
|
作者
Fathima, Afrah [1 ]
Khan, Amir [1 ]
Uddin, Md Faizan [2 ]
Waris, Mohammad Maqbool [3 ]
Ahmad, Sultan [4 ]
Sanin, Cesar [5 ]
Szczerbicki, Edward [6 ]
机构
[1] Maulana Azad Natl Urdu Univ, Dept CS & IT, Hyderabad, India
[2] Chaitanya Bharathi Inst Technol, Dept AI & Data Sci, Hyderabad, India
[3] Adama Sci & Technol Univ, Dept Mech Engn, Adama, Ethiopia
[4] Prince Sattam Bin Abdulaziz Univ, Coll Comp Engn & Sci, Dept Comp Sci, Alkharj, Saudi Arabia
[5] Univ Newcastle, Dept Mech Engn, Newcastle, NSW, Australia
[6] Gdansk Univ Technol, Fac Management & Econ, Gdansk, Poland
关键词
K-nearest neighbours model; cyber threat detection; decision tree; gradient boosting; logistic regression; machine learning models; random forest; support vector machine;
D O I
10.1080/01969722.2023.2296246
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This research work utilizes the University of New South Wales Network Based 2015 (UNSW-NB15) dataset to investigate the dynamic nature of cyber threats, departing from the obsolete Knowledge Discovery and Data Mining competition 1999 (KDD Cup99) dataset. The data preparation pipeline consists of essential procedures aimed at ensuring the integrity and appropriateness of the data for analysis. The method begins by removing null values, thereafter, applying one-hot encoding to categorical features, min-max scaling for data normalization, and label encoding for efficient management of binary labels. The process of feature selection is conducted utilizing the Pearson coefficient correlation. An exhaustive evaluation is conducted on six machine learning models for the purpose of binary classification. The evaluation takes into account key performance measures like accuracy, precision, recall, and F1 score. The Random Forest model demonstrated exceptional performance, with a remarkable accuracy of 99% and a robust F1 score of 98%. Additionally, it exhibited a well-balanced precision and recall at 98%. The Support Vector Machine, Gradient Boosting, Logistic Regression, Decision Tree, and K-Nearest Neighbors models exhibit notable performance, achieving accuracy and F1 scores around at the 98% level. During our investigation into multi-class classification research, we thoroughly examined numerous machine learning models, all of which exhibited robust performance, with accuracy rates ranging from 97% to 98%. The aforementioned results highlight the efficacy of these models in accurately classifying data, regularly achieving high levels of precision, recall, and F1 scores for positive case predictions. This study offers a current viewpoint on the identification of cyber threats and emphasizes the appropriateness of several machine learning models in this rapidly changing field.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Comparative Evaluation and Comprehensive Analysis of Machine Learning Models for Regression Problems
    Boran Sekeroglu
    Yoney Kirsal Ever
    Kamil Dimililer
    Fadi Al-Turjman
    [J]. Data Intelligence, 2022, 4 (03) : 620 - 652
  • [42] A comparative analysis of various machine learning methods for anomaly detection in cyber attacks on IoT networks
    Inuwa, Muhammad Muhammad
    Das, Resul
    [J]. INTERNET OF THINGS, 2024, 26
  • [43] A comparative study of a combinatorial machine learning approach to face detection using a very small training dataset
    Oyarzo Huichaqueo, Marco
    Magdaleno Maltas, Jordi
    [J]. 2021 IEEE CHILEAN CONFERENCE ON ELECTRICAL, ELECTRONICS ENGINEERING, INFORMATION AND COMMUNICATION TECHNOLOGIES (IEEE CHILECON 2021), 2021, : 709 - 715
  • [44] Increasing the performance of intrusion detection models developed using machine learning method with preprocessing applied to the dataset
    Ilgun, Esen Gul
    Samet, Refik
    [J]. JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2024, 39 (02): : 679 - 692
  • [45] Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning models
    Dube, Lindani
    Verster, Tanja
    [J]. DATA SCIENCE IN FINANCE AND ECONOMICS, 2023, 3 (04): : 354 - 379
  • [46] Machine learning classifier-based detection of cyber-attack on power system: Comparative analysis
    Saini, Rahul
    Verma, Yajvender Pal
    Saluja, Krishan Kumar
    [J]. 2022 22ND NATIONAL POWER SYSTEMS CONFERENCE, NPSC, 2022,
  • [47] Secure Cyber Defense: An Analysis of Network Intrusion-Based Dataset CCD-IDSv1 with Machine Learning and Deep Learning Models
    Thapa, Niraj
    Liu, Zhipeng
    Shaver, Addison
    Esterline, Albert
    Gokaraju, Balakrishna
    Roy, Kaushik
    [J]. ELECTRONICS, 2021, 10 (15)
  • [48] Performance Evaluation of Parametric and Non-Parametric Machine Learning Models using Statistical Analysis for RT-IoT2022 Dataset
    Sharmila, B. S.
    Nandini, B. M.
    Kavitha, S. S.
    Srivatsa, Anand
    [J]. JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2024, 83 (08): : 864 - 872
  • [49] Performance Evaluation of Machine Learning Models for Multi-class Lung Cancer Detection
    Kumar, M. Prema
    Ram, G. Challa
    Ravuri, Viswanadham
    Subbarao, M. Venkata
    Rahaman, Abdul S. K.
    Nandan, T. P. K.
    [J]. 2024 4TH INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND SOCIAL NETWORKING, ICPCSN 2024, 2024, : 414 - 418
  • [50] Trust in Intrusion Detection Systems: An Investigation of Performance Analysis for Machine Learning and Deep Learning Models
    Mahbooba, Basim
    Sahal, Radhya
    Alosaimi, Wael
    Serrano, Martin
    [J]. COMPLEXITY, 2021, 2021