Machine Learning With Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection

被引：0

作者：

Lin, Ying-Dar ^{[1
]}

Liu, Zi-Qiang ^{[1
]}

Hwang, Ren-Hung ^{[1
,2
]}

Van-Linh Nguyen ^{[2
,3
]}

Lin, Po-Ching ^{[2
]}

Lai, Yuan-Cheng ^{[4
]}

机构：

[1] Natl Yang Ming Chiao Tung Univ, Dept Comp Sci, Hsinchu 300, Taiwan

[2] Natl Chung Cheng Univ, Dept Comp Sci & Informat Engn, Chiayi 621, Taiwan

[3] Univ Informat & Commun Technol, Dept Informat Technol, Thai Nguyen 25000, Vietnam

[4] Natl Taiwan Univ Sci & Technol, Dept Informat Management, Taipei 106, Taiwan

来源：

IEEE ACCESS | 2022年 / 10卷

关键词：

Training; Data models; Intrusion detection; Machine learning; Soft sensors; Predictive models; Explosions; Imbalanced dataset; machine learning; variational autoencoder; intrusion detection; ANOMALY DETECTION; BOTNET DETECTION; SECURITY; PERFORMANCE;

D O I：

10.1109/ACCESS.2022.3149295

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As a result of the explosion of security attacks and the complexity of modern networks, machine learning (ML) has recently become the favored approach for intrusion detection systems (IDS). However, the ML approach usually faces three challenges: massive attack variants, imbalanced data issues, and appropriate data segmentation. Improper handling of the issues will significantly degrade ML performance, e.g., resulting in high false-negative and low recall rates. Despite many efforts have done in the literature, detecting security attacks in a complicated network environment with imperfect data collection is still an open issue. This work proposes a machine learning framework with a combination of a variational autoencoder and multilayer perceptron model to deal with imbalanced datasets and detect the explosion of attack variants on the Internet. The detection engine also includes an efficient range-based sequential search algorithm to address the segmentation challenge in data pre-processing from multiple sources (network packets, system/statistic logs) effectively. Our work is the first attempt to demonstrate the effect of using an appropriate combination of ML models for boosting IDS detection capability in a heterogeneous environment, where data collection imperfection is common. Experimental results on a public system log dataset (e.g., HDFS) show that our method gains approximately as much as 97% on F1 score and 98% on recall rate, a promising result compared to the same measurement of other solutions. Even better, we found that the proposed treatment of imbalanced datasets can improve up to 35% on the F1 score and 27% on recall rate. The testing results also indicate that our model can detect new attack variants.

引用

页码：15247 / 15260

页数：14

共 50 条

[1] Machine Learning with Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection
Lin, Ying-Dar
Liu, Zi-Qiang
Hwang, Ren-Hung
Nguyen, Van-Linh
Lin, Po-Ching
Lai, Yuan-Cheng
[J]. IEEE Access, 2022, 10 : 15247 - 15260
[2] Intrusion detection with autoencoder based deep learning machine
Kaynar, Oguz
Yuksek, Ahmet Gurkan
Gormez, Yasin
Isik, Yunus Emre
[J]. 2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
[3] A survey of intrusion detection from the perspective of intrusion datasets and machine learning techniques
Singh G.
Khare N.
[J]. International Journal of Computers and Applications, 2022, 44 (07) : 659 - 669
[4] Investigating Network Intrusion Detection Datasets Using Machine Learning
Amaizu, Gabriel Chukwunonso
Nwakanma, Cosmas Ifeanyi
Lee, Jae-Min
Kim, Dong-Seong
[J]. 11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1325 - 1328
[5] Resampling imbalanced data for network intrusion detection datasets
Sikha Bagui
Kunqi Li
[J]. Journal of Big Data, 8
[6] Resampling imbalanced data for network intrusion detection datasets
Bagui, Sikha
Li, Kunqi
[J]. JOURNAL OF BIG DATA, 2021, 8 (01)
[7] Intrusion Detection of Imbalanced Network Traffic Based on Machine Learning and Deep Learning
Liu, Lan
Wang, Pengcheng
Lin, Jun
Liu, Langzhou
[J]. IEEE Access, 2021, 9 : 7550 - 7563
[8] Intrusion Detection of Imbalanced Network Traffic Based on Machine Learning and Deep Learning
Liu, Lan
Wang, Pengcheng
Lin, Jun
Liu, Langzhou
[J]. IEEE ACCESS, 2021, 9 : 7550 - 7563
[9] A Hybrid Machine Learning Methodology for Imbalanced Datasets
Lipitakis, Anastasia-Dimitra
Kotsiantis, Sotirios
[J]. 5TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS AND APPLICATIONS, IISA 2014, 2014, : 252 - +
[10] Variational Autoencoder Based Synthetic Data Generation for Imbalanced Learning
Wan, Zhiqiang
Zhang, Yazhou
He, Haibo
[J]. 2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1500 - 1506

← 1 2 3 4 5 →