Traffic Refinery: Cost-Aware Data Representation for Machine Learning on Network Traffic

被引:9
|
作者
Bronzino, Francesco [1 ]
Schmitt, Paul [2 ]
Ayoubi, Sara [3 ]
Kim, Hyojoon [4 ]
Teixeira, Renata [5 ]
Feamster, Nick [6 ]
机构
[1] Univ Savoie Mt Blanc, LISTIC, Annecy Le Vieux, France
[2] USC Informat Sci Inst, Los Angeles, CA USA
[3] Nokia Bell Labs, Paris Saclay, France
[4] Princeton Univ, Princeton, NJ 08544 USA
[5] Inria, Paris, France
[6] Univ Chicago, Chicago, IL 60637 USA
关键词
network systems; network traffic; QoS inference; malware detection;
D O I
10.1145/3491052
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Network management often relies on machine learning to make predictions about performance and security from network traffic. Often, the representation of the traffic is as important as the choice of the model. The features that the model relies on, and the representation of those features, ultimately determine model accuracy, as well as where and whether the model can be deployed in practice. Thus, the design and evaluation of these models ultimately requires understanding not only model accuracy but also the systems costs associated with deploying the model in an operational network. Towards this goal, this paper develops a new framework and system that enables a joint evaluation of both the conventional notions of machine learning performance (e.g., model accuracy) and the systems-level costs of different representations of network traffic. We highlight these two dimensions for two practical network management tasks, video streaming quality inference and malware detection, to demonstrate the importance of exploring different representations to find the appropriate operating point. We demonstrate the benefit of exploring a range of representations of network traffic and present Traffic Refinery, a proof-of-concept implementation that both monitors network traffic at 10 Gbps and transforms traffic in real time to produce a variety of feature representations for machine learning. Traffic Refinery both highlights this design space and makes it possible to explore different representations for learning, balancing systems costs related to feature extraction and model training against model accuracy.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Cost-aware retraining for machine learning
    Mahadevan, Ananth
    Mathioudakis, Michael
    KNOWLEDGE-BASED SYSTEMS, 2024, 293
  • [2] Network Traffic Data Collection for Machine Learning Analysis
    Chao, James
    Rodriguez, Ramiro
    SPIE FUTURE SENSING TECHNOLOGIES 2023, 2023, 12327
  • [3] Cost-Aware Learning Rate for Neural Machine Translation
    Zhao, Yang
    Wang, Yining
    Zhang, Jiajun
    Zong, Chengqing
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 85 - 93
  • [4] TrafAda: Cost-Aware Traffic Adaptation for Maximizing Bitrates in Live Streaming
    Wang, Yizong
    Zhao, Dong
    Huang, Chenghao
    Yang, Fuyu
    Gao, Teng
    Zhou, Anfu
    Zhang, Huanhuan
    Ma, Huadong
    Du, Yang
    Chen, Aiyun
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (01) : 96 - 109
  • [5] Competitive Auctions for Cost-aware Cellular Traffic Offloading with Optimized Capacity Gain
    Zhang, Yuan
    Tang, Siyuan
    Chen, Tingting
    Zhong, Sheng
    IEEE INFOCOM 2016 - THE 35TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS, 2016,
  • [6] Cost-Aware Traffic Management Under Demand Uncertainty from a Colocation Data Center User's Perspective
    Zhan, Yong
    Ghamkhari, Mahdi
    Akhavan-Hejazi, Hossein
    Xu, Du
    Mohsenian-Rad, Hamed
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2021, 14 (02) : 400 - 412
  • [7] Learning and managing stochastic network traffic dynamics with an aggregate traffic representation
    Liu, Wei
    Szeto, Wai Yuen
    TRANSPORTATION RESEARCH PART B-METHODOLOGICAL, 2020, 137 : 19 - 46
  • [8] Impact of Labeling Noise on Machine Learning: A Cost-aware Empirical Study
    Gharawi, Abdulrahman Ahmed
    Alsubhi, Jumana
    Ramaswamy, Lakshmish
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 936 - 939
  • [9] Data set and machine learning models for the classification of network traffic originators
    Canavese, Daniele
    Regano, Leonardo
    Basile, Cataldo
    Ciravegna, Gabriele
    Lioy, Antonio
    DATA IN BRIEF, 2022, 41
  • [10] A network traffic-aware mobile application recommendation system based on network traffic cost consideration
    Su, Xin
    Zheng, Yi
    Lin, Jiuchuan
    Liu, Xuchong
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2019, 19 (02) : 259 - 273