Study of Multi-Class Classification Algorithms' Performance on Highly Imbalanced Network Intrusion Datasets

被引:12
|
作者
Bulavas, Viktoras [1 ]
Marcinkevicius, Virginijus [1 ]
Ruminski, Jacek [2 ]
机构
[1] Vilnius Univ, Inst Data Sci & Digital Technol, Akad Str 4, LT-08663 Vilnius, Lithuania
[2] Gdansk Univ Technol, Fac Elect Telecommun & Informat, 11-12 Gabriela Narutowicza, PL-80233 Gdansk, Poland
关键词
network intrusion detection; multi-class classification; imbalanced learning; bias and variance decomposition; SMOTE;
D O I
10.15388/21-INFOR457
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper is devoted to the problem of class imbalance in machine learning, focusing on the intrusion detection of rare classes in computer networks. The problem of class imbalance occurs when one class heavily outnumbers examples from the other classes. In this paper, we are particularly interested in classifiers, as pattern recognition and anomaly detection could be solved as a classification problem. As still a major part of data network traffic of any organization network is benign, and malignant traffic is rare, researchers therefore have to deal with a class imbalance problem. Substantial research has been undertaken in order to identify these methods or data features that allow to accurately identify these attacks. But the usual tactic to deal with the imbalance class problem is to label all malignant traffic as one class and then solve the binary classification problem. In this paper, however, we choose not to group or to drop rare classes but instead investigate what could be done in order to achieve good multi-class classification efficiency. Rare class records were up-sampled using SMOTE method (Chawla et al., 2002) to a preset ratio targets. Experiments with the 3 network traffic datasets, namely CIC-IDS2017, CSE-CIC-IDS2018 (Sharafaldin et al., 2018) and LITNET-2020 (Damasevicius et al., 2020) were performed aiming to achieve reliable recognition of rare malignant classes available in these datasets. Popular machine learning algorithms were chosen for comparison of their readiness to support rare class detection. Related algorithm hyper parameters were tuned within a wide range of values, different data feature selection methods were used and tests were executed with and without oversampling to test the multiple class problem classification performance of rare classes. Machine learning algorithms ranking based on Precision, Balanced Accuracy Score, (G) over bar, and prediction error Bias and Variance decomposition, show that decision tree ensembles (Adaboost, Random Forest Trees and Gradient Boosting Classifier) performed best on the network intrusion datasets used in this research.
引用
收藏
页码:441 / 475
页数:35
相关论文
共 50 条
  • [1] DEFECTNET: MULTI-CLASS FAULT DETECTION ON HIGHLY-IMBALANCED DATASETS
    Anantrasirichai, N.
    Bull, David
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 2481 - 2485
  • [2] Microclustering-Based Multi-Class Classification on Imbalanced Multi-Relational Datasets
    Pant, Hemlata
    Srivastava, Reena
    [J]. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND WEB ENGINEERING, 2022, 17 (01)
  • [3] Dynamic ensemble selection for multi-class imbalanced datasets
    Garcia, Salvador
    Zhang, Zhong-Liang
    Altalhi, Abdulrahman
    Alshomrani, Saleh
    Herrera, Francisco
    [J]. INFORMATION SCIENCES, 2018, 445 : 22 - 37
  • [4] Survey on Highly Imbalanced Multi-class Data
    Hamid, Hakim Abdul
    Yusoff, Marina
    Mohamed, Azlinah
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (06) : 211 - 229
  • [5] Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification
    Zak, Michal
    Wozniak, Michal
    [J]. COMPUTATIONAL SCIENCE - ICCS 2020, PT IV, 2020, 12140 : 141 - 155
  • [6] Dealing with Imbalanced Data in Multi-class Network Intrusion Detection Systems Using XGBoost
    AL-Essa, Malik
    Appice, Annalisa
    [J]. MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2021, 1525 : 5 - 21
  • [7] A survey of multi-class imbalanced data classification methods
    Han, Meng
    Li, Ang
    Gao, Zhihui
    Mu, Dongliang
    Liu, Shujuan
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2471 - 2501
  • [8] Multi-class imbalanced big data classification on Spark
    Sleeman, William C.
    Krawczyk, Bartosz
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 212
  • [9] A Combination Method for Multi-Class Imbalanced Data Classification
    Li, Hu
    Zou, Peng
    Han, Weihong
    Xia, Rongze
    [J]. 2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 365 - 368
  • [10] A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets
    ThulasiBikku
    Rao, Sambasiva
    Akepogu, Ananda Rao
    [J]. INTERNATIONAL CONFERENCE ON MATERIALS, ALLOYS AND EXPERIMENTAL MECHANICS (ICMAEM-2017), 2017, 225