Inference serving with end-to-end latency SLOs over dynamic edge networks

被引:1
|
作者
Nigade, Vinod [1 ]
Bauszat, Pablo [1 ]
Bal, Henri [1 ]
Wang, Lin [1 ]
机构
[1] Vrije Univ Amsterdam, Amsterdam, Netherlands
基金
荷兰研究理事会;
关键词
Inference serving; DNN adaptation; Data adaptation; Dynamic edge networks; Dynamic DNNs; VIDEO;
D O I
10.1007/s11241-024-09418-4
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
While high accuracy is of paramount importance for deep learning (DL) inference, serving inference requests on time is equally critical but has not been carefully studied especially when the request has to be served over a dynamic wireless network at the edge. In this paper, we propose Jellyfish-a novel edge DL inference serving system that achieves soft guarantees for end-to-end inference latency service-level objectives (SLO). Jellyfish handles the network variability by utilizing both data and deep neural network (DNN) adaptation to conduct tradeoffs between accuracy and latency. Jellyfish features a new design that enables collective adaptation policies where the decisions for data and DNN adaptations are aligned and coordinated among multiple users with varying network conditions. We propose efficient algorithms to continuously map users and adapt DNNs at runtime, so that we fulfill latency SLOs while maximizing the overall inference accuracy. We further investigate dynamic DNNs, i.e., DNNs that encompass multiple architecture variants, and demonstrate their potential benefit through preliminary experiments. Our experiments based on a prototype implementation and real-world WiFi and LTE network traces show that Jellyfish can meet latency SLOs at around the 99th percentile while maintaining high accuracy.
引用
收藏
页码:239 / 290
页数:52
相关论文
共 50 条
  • [21] Dynamic Migration of Microservices for End-to-End Latency Control in 5G/6G Networks
    Kiranpreet Kaur
    Fabrice Guillemin
    Francoise Sailhan
    Journal of Network and Systems Management, 2023, 31
  • [22] Analysis of Strategies for Minimising End-to-End Latency in 5G Networks
    Carvalho, Afonso
    Correia, Luis M.
    Grilo, Antonio
    Dinis, Ricardo
    2022 INTERNATIONAL CONFERENCE ON BROADBAND COMMUNICATIONS FOR NEXT GENERATION NETWORKS AND MULTIMEDIA APPLICATIONS (COBCOM), 2022,
  • [23] Order/Radix Problem: Towards Low End-to-End Latency Interconnection Networks
    Yasudo, Ryota
    Koibuchi, Michihiro
    Nakano, Koji
    Matsutani, Hiroki
    Amano, Hideharu
    2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, : 322 - 331
  • [24] A machine learning-based optimization for end-to-end latency in TSN networks
    Bezerra, Daniel
    Filho, Assis T. de Oliveira
    Rodrigues, Iago Richard
    Dantas, Marrone
    Barbosa, Gibson
    Souza, Ricardo
    Kelner, Judith
    Sadok, Djamel
    COMPUTER COMMUNICATIONS, 2022, 195 : 424 - 440
  • [25] Inferring End-to-End Latency in Live Videos
    Wang, Hengchao
    Zhang, Xu
    Chen, Hao
    Xu, Yiling
    Ma, Zhan
    IEEE TRANSACTIONS ON BROADCASTING, 2022, 68 (02) : 517 - 529
  • [26] Dynamic end-to-end QoS support for video over the Internet
    Bai, Y.
    Chu, Y.
    Ito, M. R.
    AEU-INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATIONS, 2011, 65 (05) : 385 - 391
  • [27] Secure End-to-End Communication over GSM and PSTN Networks
    Islam, Saad
    Ajmal, Fatima
    Ali, Salman
    Zahid, Jawad
    Rashdi, Adnan
    2009 IEEE INTERNATIONAL CONFERENCE ON ELECTRO/INFORMATION TECHNOLOGY, 2009, : 321 - 324
  • [28] Secure End-to-End SMS Communication over GSM Networks
    Islam, Saad
    Ul Haq, Inam
    Saeed, Amna
    2015 12TH INTERNATIONAL BHURBAN CONFERENCE ON APPLIED SCIENCES AND TECHNOLOGY (IBCAST), 2015, : 286 - 292
  • [29] MODELING END-TO-END PROTOCOLS OVER INTERCONNECTED HETEROGENEOUS NETWORKS
    WOLISZ, A
    POPESCUZELETIN, R
    COMPUTER COMMUNICATIONS, 1992, 15 (01) : 11 - 22
  • [30] End-to-end delay of videoconferencing over packet switched networks
    Baldi, M
    Ofek, Y
    IEEE INFOCOM '98 - THE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS. 1-3: GATEWAY TO THE 21ST CENTURY, 1998, : 1084 - 1092