Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks

被引:13
|
作者
Mozo, Alberto [1 ]
Gonzalez-Prieto, Angel [2 ,3 ]
Pastor, Antonio [1 ,4 ]
Gomez-Canaval, Sandra [1 ]
Talavera, Edgar [1 ]
机构
[1] Univ Politecn Madrid, Madrid, Spain
[2] Univ Complutense Madrid, Madrid, Spain
[3] Inst Ciencias Matemat CSIC UAM UCM UC3M, Madrid, Spain
[4] Telefonica I D, Madrid, Spain
关键词
D O I
10.1038/s41598-022-06057-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Due to the growing rise of cyber attacks in the Internet, the demand of accurate intrusion detection systems (IDS) to prevent these vulnerabilities is increasing. To this aim, Machine Learning (ML) components have been proposed as an efficient and effective solution. However, its applicability scope is limited by two important issues: (i) the shortage of network traffic data datasets for attack analysis, and (ii) the data privacy constraints of the data to be used. To overcome these problems, Generative Adversarial Networks (GANs) have been proposed for synthetic flow-based network traffic generation. However, due to the ill-convergence of the GAN training, none of the existing solutions can generate high-quality fully synthetic data that can totally substitute real data in the training of ML components. In contrast, they mix real with synthetic data, which acts only as data augmentation components, leading to privacy breaches as real data is used. In sharp contrast, in this work we propose a novel and deterministic way to measure the quality of the synthetic data produced by a GAN both with respect to the real data and to its performance when used for ML tasks. As a by-product, we present a heuristic that uses these metrics for selecting the best performing generator during GAN training, leading to a novel stopping criterion, which can be applied even when different types of synthetic data are to be used in the same ML task. We demonstrate the adequacy of our proposal by generating synthetic cryptomining attacks and normal traffic flow-based data using an enhanced version of a Wasserstein GAN. The results evidence that the generated synthetic network traffic can completely replace real data when training a ML-based cryptomining detector, obtaining similar performance and avoiding privacy violations, since real data is not used in the training of the ML-based detector.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks
    Alberto Mozo
    Ángel González-Prieto
    Antonio Pastor
    Sandra Gómez-Canaval
    Edgar Talavera
    [J]. Scientific Reports, 12
  • [2] Flow-based network traffic generation using Generative Adversarial Networks
    Ring, Markus
    Schloer, Daniel
    Landes, Dieter
    Hotho, Andreas
    [J]. COMPUTERS & SECURITY, 2019, 82 : 156 - 172
  • [3] Synthetic Intrusion Alert Generation through Generative Adversarial Networks
    Sweet, Christopher
    Moskal, Stephen
    Yang, Shanchieh Jay
    [J]. MILCOM 2019 - 2019 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM), 2019,
  • [4] Adversarial Robustness of Flow-Based Generative Models
    Pope, Phillip
    Balaji, Yogesh
    Feizi, Soheil
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 3795 - 3804
  • [5] Generation of Synthetic Data with Conditional Generative Adversarial Networks
    Vega-Marquez, Belen
    Rubio-Escudero, Cristina
    Nepomuceno-Chamorro, Isabel
    [J]. LOGIC JOURNAL OF THE IGPL, 2022, 30 (02) : 252 - 262
  • [6] Synthetic Traffic Generation with Wasserstein Generative Adversarial Networks
    Wu, Chao-Lun
    Chen, Yu-Ying
    Chou, Po-Yu
    Wang, Chih-Yu
    [J]. 2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, : 1503 - 1508
  • [7] Investigating on the robustness of flow-based intrusion detection system against adversarial samples using Generative Adversarial Networks
    Duy, Phan The
    Khoa, Nghi Hoang
    Hien, Do Thi Thu
    Hoang, Hien Do
    Pham, Van-Hau
    [J]. JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2023, 74
  • [8] Supporting Database Constraints in Synthetic Data Generation based on Generative Adversarial Networks
    Li, Wanxin
    [J]. SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 2875 - 2877
  • [9] Synthetic Fingerprint Generation Using Generative Adversarial Networks: A Review
    Dhaneshwar, Ritika
    Taya, Arnav
    Kaur, Mandeep
    [J]. FOURTH CONGRESS ON INTELLIGENT SYSTEMS, VOL 1, CIS 2023, 2024, 868 : 375 - 387
  • [10] Generative Adversarial Networks applied to synthetic financial scenarios generation
    Rizzato, Matteo
    Wallart, Julien
    Geissler, Christophe
    Morizet, Nicolas
    Boumlaik, Noureddine
    [J]. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2023, 623