Investigating rarity in web attacks with ensemble learners

被引:8
|
作者
Zuech, Richard [1 ]
Hancock, John [1 ]
Khoshgoftaar, Taghi M. [1 ]
机构
[1] Florida Atlantic Univ, 777 Glades Rd, Boca Raton, FL 33431 USA
基金
美国国家科学基金会;
关键词
Rarity; CSE-CIC-IDS2018; Intrusion detection; Web attacks; Class imbalance; Random undersampling; Big data; Ensemble learners; INTRUSION DETECTION; FRAUD DETECTION; BIG; CURVE; MODEL;
D O I
10.1186/s40537-021-00462-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Class rarity is a frequent challenge in cybersecurity. Rarity occurs when the positive (attack) class only has a small number of instances for machine learning classifiers to train upon, thus making it difficult for the classifiers to discriminate and learn from the positive class. To investigate rarity, we examine three individual web attacks in big data from the CSE-CIC-IDS2018 dataset: "Brute Force-Web", "Brute Force-XSS", and "SQL Injection". These three individual web attacks are also severely imbalanced, and so we evaluate whether random undersampling (RUS) treatments can improve the classification performance for these three individual web attacks. The following eight different levels of RUS ratios are evaluated: no sampling, 999:1, 99:1, 95:5, 9:1, 3:1, 65:35, and 1:1. For measuring classification performance, Area Under the Receiver Operating Characteristic Curve (AUC) metrics are obtained for the following seven different classifiers: Random Forest (RF), CatBoost (CB), LightGBM (LGB), XGBoost (XGB), Decision Tree (DT), Naive Bayes (NB), and Logistic Regression (LR) (with the first four learners being ensemble learners and for comparison, the last three being single learners). We find that applying random undersampling does improve overall classification performance with the AUC metric in a statistically significant manner. Ensemble learners achieve the top AUC scores after massive undersampling is applied, but the ensemble learners break down and have poor performance (worse than NB and DT) when no sampling is applied to our unique and harsh experimental conditions of severe class imbalance and rarity.
引用
收藏
页数:27
相关论文
共 50 条
  • [41] WEB OF PERFORMANCE: AN ENSEMBLE WORKBOOK
    Massey-Chase, Kate
    [J]. APPLIED THEATRE RESEARCH, 2019, 7 (01) : 137 - 139
  • [42] AN IMPROVED ENSEMBLE APPROACH FOR DOS ATTACKS DETECTION
    Alguliyev, R. M.
    Aliguliyev, R. M.
    Imamverdiyev, Y. N.
    Sukhostat, L., V
    [J]. RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2018, (02) : 73 - 82
  • [43] Investigating task difficulty: learners' and teachers' perceptions
    Tavakoli, Parvaneh
    [J]. INTERNATIONAL JOURNAL OF APPLIED LINGUISTICS, 2009, 19 (01) : 1 - 25
  • [44] Investigating Learners' Behaviors and Implementing Intervention in a SPOC
    Wan, Han
    Zhong, Zihao
    Tang, Lina
    Gao, Xiaopeng
    [J]. 2021 IEEE FRONTIERS IN EDUCATION CONFERENCE (FIE 2021), 2021,
  • [45] Investigating the Needs of Foreign Language Learners of Tuvan
    Soyan, Rossina
    [J]. JOURNAL OF LANGUAGE IDENTITY AND EDUCATION, 2022, 21 (05): : 303 - 315
  • [46] Blackbox Attacks via Surrogate Ensemble Search
    Cai, Zikui
    Song, Chengyu
    Krishnamurthy, Srikanth
    Roy-Chowdhury, Amit
    Asif, M. Salman
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [47] Methods for Investigating Mental Models For Learners of APIs
    Horvath, Amber
    Nagy, Mariann
    Voichick, Finn
    Kery, Mary Beth
    Myers, Brad A.
    [J]. CHI EA '19 EXTENDED ABSTRACTS: EXTENDED ABSTRACTS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,
  • [48] INVESTIGATING THE SOCIALIZATION OF SECOND LANGUAGE LEARNERS OF MATHEMATICS
    Barwell, Richard
    [J]. PME 33: PROCEEDINGS OF THE 33RD CONFERENCE OF THE INTERNATIONAL GROUP FOR THE PSYCHOLOGY OF MATHEMATICS EDUCATION, VOL 1, 2009, 1 : 334 - 334
  • [49] Ensemble Wavelet Learners for Demand Forecasting in Energy Grids
    Kumar, M. Satheesh
    Subathra, B.
    Srinivasan, Seshadhri
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT TECHNIQUES IN CONTROL, OPTIMIZATION AND SIGNAL PROCESSING (INCOS), 2017,
  • [50] Output Thresholding for Ensemble Learners and Imbalanced Big Data
    Johnson, Justin M.
    Khoshgoftaar, Taghi M.
    [J]. 2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 1449 - 1454