Machine Learning With Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection

被引:0
|
作者
Lin, Ying-Dar [1 ]
Liu, Zi-Qiang [1 ]
Hwang, Ren-Hung [1 ,2 ]
Van-Linh Nguyen [2 ,3 ]
Lin, Po-Ching [2 ]
Lai, Yuan-Cheng [4 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Dept Comp Sci, Hsinchu 300, Taiwan
[2] Natl Chung Cheng Univ, Dept Comp Sci & Informat Engn, Chiayi 621, Taiwan
[3] Univ Informat & Commun Technol, Dept Informat Technol, Thai Nguyen 25000, Vietnam
[4] Natl Taiwan Univ Sci & Technol, Dept Informat Management, Taipei 106, Taiwan
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Training; Data models; Intrusion detection; Machine learning; Soft sensors; Predictive models; Explosions; Imbalanced dataset; machine learning; variational autoencoder; intrusion detection; ANOMALY DETECTION; BOTNET DETECTION; SECURITY; PERFORMANCE;
D O I
10.1109/ACCESS.2022.3149295
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As a result of the explosion of security attacks and the complexity of modern networks, machine learning (ML) has recently become the favored approach for intrusion detection systems (IDS). However, the ML approach usually faces three challenges: massive attack variants, imbalanced data issues, and appropriate data segmentation. Improper handling of the issues will significantly degrade ML performance, e.g., resulting in high false-negative and low recall rates. Despite many efforts have done in the literature, detecting security attacks in a complicated network environment with imperfect data collection is still an open issue. This work proposes a machine learning framework with a combination of a variational autoencoder and multilayer perceptron model to deal with imbalanced datasets and detect the explosion of attack variants on the Internet. The detection engine also includes an efficient range-based sequential search algorithm to address the segmentation challenge in data pre-processing from multiple sources (network packets, system/statistic logs) effectively. Our work is the first attempt to demonstrate the effect of using an appropriate combination of ML models for boosting IDS detection capability in a heterogeneous environment, where data collection imperfection is common. Experimental results on a public system log dataset (e.g., HDFS) show that our method gains approximately as much as 97% on F1 score and 98% on recall rate, a promising result compared to the same measurement of other solutions. Even better, we found that the proposed treatment of imbalanced datasets can improve up to 35% on the F1 score and 27% on recall rate. The testing results also indicate that our model can detect new attack variants.
引用
收藏
页码:15247 / 15260
页数:14
相关论文
共 50 条
  • [1] Machine Learning with Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection
    Lin, Ying-Dar
    Liu, Zi-Qiang
    Hwang, Ren-Hung
    Nguyen, Van-Linh
    Lin, Po-Ching
    Lai, Yuan-Cheng
    [J]. IEEE Access, 2022, 10 : 15247 - 15260
  • [2] Intrusion detection with autoencoder based deep learning machine
    Kaynar, Oguz
    Yuksek, Ahmet Gurkan
    Gormez, Yasin
    Isik, Yunus Emre
    [J]. 2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [3] A survey of intrusion detection from the perspective of intrusion datasets and machine learning techniques
    Singh G.
    Khare N.
    [J]. International Journal of Computers and Applications, 2022, 44 (07) : 659 - 669
  • [4] Investigating Network Intrusion Detection Datasets Using Machine Learning
    Amaizu, Gabriel Chukwunonso
    Nwakanma, Cosmas Ifeanyi
    Lee, Jae-Min
    Kim, Dong-Seong
    [J]. 11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1325 - 1328
  • [5] Resampling imbalanced data for network intrusion detection datasets
    Sikha Bagui
    Kunqi Li
    [J]. Journal of Big Data, 8
  • [6] Resampling imbalanced data for network intrusion detection datasets
    Bagui, Sikha
    Li, Kunqi
    [J]. JOURNAL OF BIG DATA, 2021, 8 (01)
  • [7] Intrusion Detection of Imbalanced Network Traffic Based on Machine Learning and Deep Learning
    Liu, Lan
    Wang, Pengcheng
    Lin, Jun
    Liu, Langzhou
    [J]. IEEE Access, 2021, 9 : 7550 - 7563
  • [8] Intrusion Detection of Imbalanced Network Traffic Based on Machine Learning and Deep Learning
    Liu, Lan
    Wang, Pengcheng
    Lin, Jun
    Liu, Langzhou
    [J]. IEEE ACCESS, 2021, 9 : 7550 - 7563
  • [9] A Hybrid Machine Learning Methodology for Imbalanced Datasets
    Lipitakis, Anastasia-Dimitra
    Kotsiantis, Sotirios
    [J]. 5TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS AND APPLICATIONS, IISA 2014, 2014, : 252 - +
  • [10] Variational Autoencoder Based Synthetic Data Generation for Imbalanced Learning
    Wan, Zhiqiang
    Zhang, Yazhou
    He, Haibo
    [J]. 2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1500 - 1506