Research on Network Intrusion Detection Based on SMOTE Algorithm and Machine Learning

被引:0
|
作者
Zhang Y. [1 ]
Zhang T. [1 ]
Chen J. [1 ]
Wang Y. [1 ]
Zou Q. [1 ]
机构
[1] China Information Technology Security Evaluation Center, Beijing
来源
Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology | 2019年 / 39卷 / 12期
关键词
Data rebalancing; Machine learning; Network intrusion detection; SMOTE algorithm;
D O I
10.15918/j.tbit1001-0645.2018.423
中图分类号
学科分类号
摘要
The machine learning model has been widely used in network intrusion detection, but researchers pay more attention to model selection and parameter optimization, but rarely consider the impact of data imbalance, which often leads to poor detection effect of a small number of intrusion samples. To solve this problem, focusing on the data rebalancing algorithm of SMOTE(synthetic minority oversampling technique), taking the intrusion detection data set KDD99 as the original training set, a simple sampling method and SMOTE algorithm were used to generate the rebalancing training set. And then, a variety of machine learning models were used to perform 5 fold cross-validation for the original training set and the rebalanced training set respectively. Experimental results show that, compared with the original training set, the use of rebalancing training set modeling can improve the recognition accuracy and recall rate of the minor class samples by about 10%~20% without reducing or even improving the recognition effect of major class samples. Therefore, SMOTE algorithm can significantly improve network intrusion detection under unbalanced samples. © 2019, Editorial Department of Transaction of Beijing Institute of Technology. All right reserved.
引用
收藏
页码:1258 / 1262
页数:4
相关论文
共 13 条
  • [1] Moorthy M., Sathiyabama S., A study of intrusion detection using data mining, International Conference on Advances in Engineering, Science and Management, pp. 8-15, (2012)
  • [2] Hu C., Theory and Technology of Network Intrusion Detection, (2010)
  • [3] Pietraszek T., Tanner A., Data mining and machine learning-towards reducing false positives in intrusion detection, Information Security Technical Report, 10, 3, pp. 169-183, (2005)
  • [4] Tesfahun A., Bhaskari D.L., Intrusion detection using random forests classifier with SMOTE and feature reduction, International Conference on Cloud & Ubiquitous Computing & Emerging Technologies, pp. 127-132, (2014)
  • [5] Li L., Yu Y., Bai S., Et al., Intrusion detection model based on double training technique, Transactions of Beijing Institute of Technology, 12, pp. 1246-1252, (2017)
  • [6] Batista G.E.A.P.A., Prati R.C., Monard M.C., A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD Explorations Newsletter, 6, 1, pp. 20-29, (2004)
  • [7] Chawla N.V., Bowyer K.W., Hall L.O., Et al., SMOTE: synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, 1, pp. 321-357, (2002)
  • [8] Han H., Wang W.Y., Mao B.H., Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning, International Conference on Advances in Intelligent Computing, pp. 878-887, (2005)
  • [9] Wilson D.L., Asymptotic properties of nearest neighbor rules using edited data, IEEE Transactions on Systems Man & Cybernetics, SMC-2, 3, pp. 408-421, (2007)
  • [10] Zhang X., Zeng H., Jia L., Research on intrusion detection data set KDD CUP99, Computer Engineering and Design, 31, 22, pp. 4809-4812, (2010)