Active learning approach to label network traffic datasets

被引:16
|
作者
Guerra Torres, Jorge L. [1 ]
Catania, Carlos A. [2 ]
Veas, Eduardo [3 ]
机构
[1] Natl Univ Cuyo, Inst Informat Technol & Commun, Mendoza, Argentina
[2] Natl Univ Cuyo, Sch Engn, LABSIN, Mendoza, Argentina
[3] Graz Univ Technol, Inst Interact Syst & Data Sci, Graz, Austria
关键词
Active learning; Labeling network; Random Forest; Learning rate; Noise robustness;
D O I
10.1016/j.jisa.2019.102388
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the field of network security, the process of labeling a network traffic dataset is specially expensive since expert knowledge is required to perform the annotations. With the aid of visual analytic applications such as RiskID, the effort of labeling network traffic is considerable reduced. However, since the label assignment still requires an expert pondering several factors, the annotation process remains a difficult task. The present article introduces a novel active learning strategy for building a random forest model based on user previously-labeled connections. The resulting model provides to the user an estimation of the probability of the remaining unlabeled connections helping him in the traffic annotation task. The article describes the active learning strategy, the interfaces with the RiskID system, the algorithms used to predict botnet behavior, and a proposed evaluation framework. The evaluation framework includes studies to assess not only the prediction performance of the active learning strategy but also the learning rate and resilience against noise as well as the improvements on other well known labeling strategies. The framework represents a complete methodology for evaluating the performance of any active learning solution. The evaluation results showed proposed approach is a significant improvement over previous labeling strategies. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Label-less Learning for Traffic Control in an Edge Network
    Chen, Min
    Hao, Yixue
    Lin, Kai
    Yuan, Zhiyong
    Hu, Long
    [J]. IEEE NETWORK, 2018, 32 (06): : 8 - 14
  • [2] Active Learning for Network Traffic Classification: A Technical Study
    Shahraki, Amin
    Abbasi, Mahmoud
    Taherkordi, Amir
    Jurcut, Anca Delia
    [J]. IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2022, 8 (01) : 422 - 439
  • [3] Internet Traffic Classification using Machine Learning Approach: Datasets Validation Issues
    Ibrahim, Hamza Awad Hamza
    AL Zuobi, Omer Radhi Aqeel
    Al-Namari, Marwan A.
    MohamedAli, Gaafer
    Abdalla, Ali Ahmed Alfaki
    [J]. 2016 CONFERENCE OF BASIC SCIENCES AND ENGINEERING STUDIES (SCGAC), 2016, : 158 - 166
  • [4] Active Learning for Imbalanced Datasets
    Aggarwal, Umang
    Popescu, Adrian
    Hudelot, Celine
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1417 - 1426
  • [5] Network Traffic Obfuscation: An Adversarial Machine Learning Approach
    Verma, Gunjan
    Ciftcioglu, Ertugrul
    Sheatsley, Ryan
    Chan, Kevin
    Scott, Lisa
    [J]. 2018 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM 2018), 2018, : 413 - 418
  • [6] Datasets are not enough: Challenges in labeling network traffic
    Guerra, Jorge Luis
    Catania, Carlos
    Veas, Eduardo
    [J]. COMPUTERS & SECURITY, 2022, 120
  • [7] Network traffic classification: Techniques, datasets, and challenges
    Ahmad Azab
    Mahmoud Khasawneh
    Saed Alrabaee
    KimKwang Raymond Choo
    Maysa Sarsour
    [J]. Digital Communications and Networks., 2024, 10 (03) - 692
  • [8] Network traffic classification: Techniques, datasets, and challenges
    Azab, Ahmad
    Khasawneh, Mahmoud
    Alrabaee, Saed
    Choo, Kim-Kwang Raymond
    Sarsour, Maysa
    [J]. DIGITAL COMMUNICATIONS AND NETWORKS, 2024, 10 (03) : 676 - 692
  • [9] Network Traffic Images: A Deep Learning Approach to the Challenge of Internet Traffic Classification
    Saleh, Ibraheem
    Ji, Hao
    [J]. 2020 10TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2020, : 329 - 334
  • [10] Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation
    Kossen, Jannik
    Farquhar, Sebastian
    Gal, Yarin
    Rainforth, Tom
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,