Performance Evaluation and Comparative Analysis of Machine Learning Models on the UNSW-NB15 Dataset: A Contemporary Approach to Cyber Threat Detection

被引:1
|
作者
Fathima, Afrah [1 ]
Khan, Amir [1 ]
Uddin, Md Faizan [2 ]
Waris, Mohammad Maqbool [3 ]
Ahmad, Sultan [4 ]
Sanin, Cesar [5 ]
Szczerbicki, Edward [6 ]
机构
[1] Maulana Azad Natl Urdu Univ, Dept CS & IT, Hyderabad, India
[2] Chaitanya Bharathi Inst Technol, Dept AI & Data Sci, Hyderabad, India
[3] Adama Sci & Technol Univ, Dept Mech Engn, Adama, Ethiopia
[4] Prince Sattam Bin Abdulaziz Univ, Coll Comp Engn & Sci, Dept Comp Sci, Alkharj, Saudi Arabia
[5] Univ Newcastle, Dept Mech Engn, Newcastle, NSW, Australia
[6] Gdansk Univ Technol, Fac Management & Econ, Gdansk, Poland
关键词
K-nearest neighbours model; cyber threat detection; decision tree; gradient boosting; logistic regression; machine learning models; random forest; support vector machine;
D O I
10.1080/01969722.2023.2296246
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This research work utilizes the University of New South Wales Network Based 2015 (UNSW-NB15) dataset to investigate the dynamic nature of cyber threats, departing from the obsolete Knowledge Discovery and Data Mining competition 1999 (KDD Cup99) dataset. The data preparation pipeline consists of essential procedures aimed at ensuring the integrity and appropriateness of the data for analysis. The method begins by removing null values, thereafter, applying one-hot encoding to categorical features, min-max scaling for data normalization, and label encoding for efficient management of binary labels. The process of feature selection is conducted utilizing the Pearson coefficient correlation. An exhaustive evaluation is conducted on six machine learning models for the purpose of binary classification. The evaluation takes into account key performance measures like accuracy, precision, recall, and F1 score. The Random Forest model demonstrated exceptional performance, with a remarkable accuracy of 99% and a robust F1 score of 98%. Additionally, it exhibited a well-balanced precision and recall at 98%. The Support Vector Machine, Gradient Boosting, Logistic Regression, Decision Tree, and K-Nearest Neighbors models exhibit notable performance, achieving accuracy and F1 scores around at the 98% level. During our investigation into multi-class classification research, we thoroughly examined numerous machine learning models, all of which exhibited robust performance, with accuracy rates ranging from 97% to 98%. The aforementioned results highlight the efficacy of these models in accurately classifying data, regularly achieving high levels of precision, recall, and F1 scores for positive case predictions. This study offers a current viewpoint on the identification of cyber threats and emphasizes the appropriateness of several machine learning models in this rapidly changing field.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Using machine learning techniques to identify rare cyber-attacks on the UNSW-NB15 dataset
    Bagui, Sikha
    Kalaimannan, Ezhil
    Bagui, Subhash
    Nandi, Debarghya
    Pinto, Anthony
    [J]. SECURITY AND PRIVACY, 2019, 2 (06):
  • [2] Improving the Performance of Machine Learning-Based Network Intrusion Detection Systems on the UNSW-NB15 Dataset
    Moualla, Soulaiman
    Khorzom, Khaldoun
    Jafar, Assef
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [3] Comparison of Machine Learning-Based Intrusion Detection Systems Using UNSW-NB15 Dataset
    Sambandam, Rakoth Kandan
    Daniel, D.
    Gokulapriya, R.
    Vetriveeran, Divya
    Jenefa, J.
    Anuneshwar
    [J]. ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 2, AITA 2023, 2024, 844 : 311 - 324
  • [4] SVM Based Network Intrusion Detection for the UNSW-NB15 Dataset
    Jing, Dishan
    Chen, Hai-Bao
    [J]. 2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ASIC (ASICON), 2019,
  • [5] Performance Analysis of Intrusion Detection Systems Using a Feature Selection Method on the UNSW-NB15 Dataset
    Kasongo, Sydney M.
    Sun, Yanxia
    [J]. JOURNAL OF BIG DATA, 2020, 7 (01)
  • [6] Comparative Analysis of Feed-Forward and RNN Models for Intrusion Detection in Data Network Security with UNSW-NB15 Dataset
    Cavojsky, Matus
    Bugar, Gabriel
    Levicky, Dusan
    [J]. 2023 33RD INTERNATIONAL CONFERENCE RADIOELEKTRONIKA, RADIOELEKTRONIKA, 2023,
  • [7] Performance Analysis of Intrusion Detection Systems Using a Feature Selection Method on the UNSW-NB15 Dataset
    Sydney M. Kasongo
    Yanxia Sun
    [J]. Journal of Big Data, 7
  • [8] UNSW-NB15 computer security dataset: Analysis through visualization
    Zoghi, Zeinab
    Serpen, Gursel
    [J]. SECURITY AND PRIVACY, 2024, 7 (01)
  • [9] SC-CVAR: Intrusion Detection Using Feature Selection and Machine Learning Techniques on UNSW-NB15 Dataset
    Rosy, J. Vimal
    Kumar, S. Britto Ramesh
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (04): : 691 - 699
  • [10] Classification of UNSW-NB15 dataset using Exploratory Data Analysis using Ensemble Learning
    Sharma, Neha
    Yadav, Narendra Singh
    Sharma, Saurabh
    [J]. EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 2021, 8 (29):