Traffic Refinery: Cost-Aware Data Representation for Machine Learning on Network Traffic

被引:9
|
作者
Bronzino, Francesco [1 ]
Schmitt, Paul [2 ]
Ayoubi, Sara [3 ]
Kim, Hyojoon [4 ]
Teixeira, Renata [5 ]
Feamster, Nick [6 ]
机构
[1] Univ Savoie Mt Blanc, LISTIC, Annecy Le Vieux, France
[2] USC Informat Sci Inst, Los Angeles, CA USA
[3] Nokia Bell Labs, Paris Saclay, France
[4] Princeton Univ, Princeton, NJ 08544 USA
[5] Inria, Paris, France
[6] Univ Chicago, Chicago, IL 60637 USA
关键词
network systems; network traffic; QoS inference; malware detection;
D O I
10.1145/3491052
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Network management often relies on machine learning to make predictions about performance and security from network traffic. Often, the representation of the traffic is as important as the choice of the model. The features that the model relies on, and the representation of those features, ultimately determine model accuracy, as well as where and whether the model can be deployed in practice. Thus, the design and evaluation of these models ultimately requires understanding not only model accuracy but also the systems costs associated with deploying the model in an operational network. Towards this goal, this paper develops a new framework and system that enables a joint evaluation of both the conventional notions of machine learning performance (e.g., model accuracy) and the systems-level costs of different representations of network traffic. We highlight these two dimensions for two practical network management tasks, video streaming quality inference and malware detection, to demonstrate the importance of exploring different representations to find the appropriate operating point. We demonstrate the benefit of exploring a range of representations of network traffic and present Traffic Refinery, a proof-of-concept implementation that both monitors network traffic at 10 Gbps and transforms traffic in real time to produce a variety of feature representations for machine learning. Traffic Refinery both highlights this design space and makes it possible to explore different representations for learning, balancing systems costs related to feature extraction and model training against model accuracy.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] Digital Investigation of Network Traffic Using Machine Learning
    Chatterjee, Saswati
    Satpathy, Suneeta
    Nibedita, Arpita
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (01)
  • [32] Encrypted network traffic classification based on machine learning
    Elmaghraby, Reham T.
    Aziem, Nada M. Abdel
    Sobh, Mohammed A.
    Bahaa-Eldin, Ayman M.
    AIN SHAMS ENGINEERING JOURNAL, 2024, 15 (02)
  • [33] Network Traffic Obfuscation: An Adversarial Machine Learning Approach
    Verma, Gunjan
    Ciftcioglu, Ertugrul
    Sheatsley, Ryan
    Chan, Kevin
    Scott, Lisa
    2018 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM 2018), 2018, : 413 - 418
  • [34] Feedback Control of Traffic Signal Network of Less Traffic Sensors by Help of Machine Learning
    Wakahara, Takumi
    Mikami, Sadayoshi
    INTELLIGENT AUTONOMOUS SYSTEMS 12 , VOL 2, 2013, 194 : 853 - +
  • [35] Network traffic reduction and representation
    Melhim, Loai Kayed B.
    Jemmali, Mahdi
    AsSadhan, Basil
    Alquhayz, Hani
    INTERNATIONAL JOURNAL OF SENSOR NETWORKS, 2020, 33 (04) : 239 - 249
  • [36] A Greedy Approach to Cost-Aware Virtual Machine Allocation for 100% Green Data Centers
    Wang, Hai
    Wei, Haikun
    2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2015, : 647 - 650
  • [37] Power and Cost-aware Virtual Machine Placement in Geo-distributed Data Centers
    Rawas, Soha
    Zekri, Ahmed
    El Zaart, Ali
    CLOSER: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2018, : 112 - 123
  • [38] Sparse Representation and Dictionary Learning for Network Traffic Anomaly Detection
    Kierul, Tomasz
    Kierul, Michal
    Andrysiak, Tomasz
    Saganowski, Lukasz
    THEORY AND APPLICATIONS OF DEPENDABLE COMPUTER SYSTEMS, DEPCOS-RELCOMEX 2020, 2020, 1173 : 344 - 354
  • [39] Sparse Big Data for Vehicular Network Traffic Flow Estimation: A Machine Learning Approach
    Xue, Jianzhe
    Zhang, Tianqi
    Wu, Wen
    Zhou, Haibo
    Shen, Xuemin
    2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, : 4959 - 4963
  • [40] Cost-Aware Virtual Machine Allocation for Off-Grid Green Data Centers
    Zhu, Tingting
    Wang, Hai
    Wei, Haikun
    2017 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS (PERCOM WORKSHOPS), 2017,