Investigating rarity in web attacks with ensemble learners

被引:8
|
作者
Zuech, Richard [1 ]
Hancock, John [1 ]
Khoshgoftaar, Taghi M. [1 ]
机构
[1] Florida Atlantic Univ, 777 Glades Rd, Boca Raton, FL 33431 USA
基金
美国国家科学基金会;
关键词
Rarity; CSE-CIC-IDS2018; Intrusion detection; Web attacks; Class imbalance; Random undersampling; Big data; Ensemble learners; INTRUSION DETECTION; FRAUD DETECTION; BIG; CURVE; MODEL;
D O I
10.1186/s40537-021-00462-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Class rarity is a frequent challenge in cybersecurity. Rarity occurs when the positive (attack) class only has a small number of instances for machine learning classifiers to train upon, thus making it difficult for the classifiers to discriminate and learn from the positive class. To investigate rarity, we examine three individual web attacks in big data from the CSE-CIC-IDS2018 dataset: "Brute Force-Web", "Brute Force-XSS", and "SQL Injection". These three individual web attacks are also severely imbalanced, and so we evaluate whether random undersampling (RUS) treatments can improve the classification performance for these three individual web attacks. The following eight different levels of RUS ratios are evaluated: no sampling, 999:1, 99:1, 95:5, 9:1, 3:1, 65:35, and 1:1. For measuring classification performance, Area Under the Receiver Operating Characteristic Curve (AUC) metrics are obtained for the following seven different classifiers: Random Forest (RF), CatBoost (CB), LightGBM (LGB), XGBoost (XGB), Decision Tree (DT), Naive Bayes (NB), and Logistic Regression (LR) (with the first four learners being ensemble learners and for comparison, the last three being single learners). We find that applying random undersampling does improve overall classification performance with the AUC metric in a statistically significant manner. Ensemble learners achieve the top AUC scores after massive undersampling is applied, but the ensemble learners break down and have poor performance (worse than NB and DT) when no sampling is applied to our unique and harsh experimental conditions of severe class imbalance and rarity.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Investigating rarity in web attacks with ensemble learners
    Richard Zuech
    John Hancock
    Taghi M. Khoshgoftaar
    [J]. Journal of Big Data, 8
  • [2] Detecting web attacks using random undersampling and ensemble learners
    Zuech, Richard
    Hancock, John
    Khoshgoftaar, Taghi M.
    [J]. JOURNAL OF BIG DATA, 2021, 8 (01)
  • [3] Detecting web attacks using random undersampling and ensemble learners
    Richard Zuech
    John Hancock
    Taghi M. Khoshgoftaar
    [J]. Journal of Big Data, 8
  • [4] Detecting SQL Injection Web Attacks Using Ensemble Learners and Data Sampling
    Zuech, Richard
    Hancock, John
    Khoshgoftaar, Taghi M.
    [J]. PROCEEDINGS OF THE 2021 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE (IEEE CSR), 2021, : 27 - 34
  • [5] Energy Load Forecasting: Investigating Mid-Term Predictions with Ensemble Learners
    Liapis, Charalampos M.
    Karanikola, Aikaterini
    Kotsiantis, Sotiris
    [J]. ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2022, PART I, 2022, 646 : 343 - 355
  • [6] EL-RFHC: Optimized ensemble learners using RFHC for intrusion attacks classification
    Kuppusamy, P.
    Kapadia, Dev
    Manvitha, Edaboina Godha
    Dhahbi, Sami
    Iwendi, C.
    Khan, M. Ijaz
    Mohanty, Sachi Nandan
    Ben Khedher, Nidhal
    [J]. AIN SHAMS ENGINEERING JOURNAL, 2024, 15 (07)
  • [7] Mining with Rarity for Web Intelligence
    Gui, Yijie
    Gan, Wensheng
    Chen, Yao
    Wu, Yongdong
    [J]. COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 973 - 981
  • [8] Investigating class rarity in big data
    Tawfiq Hasanin
    Taghi M. Khoshgoftaar
    Joffrey L. Leevy
    Richard A. Bauder
    [J]. Journal of Big Data, 7
  • [9] Detecting Web Attacks using Stacked Denoising Autoencoder and Ensemble Learning Methods
    Truong, Dung
    Tran, Due
    Nguyen, Lam
    Mac, Hieu
    Tran, Hai Anh
    Bui, Tung
    [J]. SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 267 - 272
  • [10] Investigating the use of inquiry & web-based activities with inclusive biology learners
    Bodzin, Alec M.
    Waller, Patricia L.
    Santoro, Lana Edwards
    Kale, Darlene
    [J]. AMERICAN BIOLOGY TEACHER, 2007, 69 (05): : 273 - 279