Inference serving with end-to-end latency SLOs over dynamic edge networks

被引:1
|
作者
Nigade, Vinod [1 ]
Bauszat, Pablo [1 ]
Bal, Henri [1 ]
Wang, Lin [1 ]
机构
[1] Vrije Univ Amsterdam, Amsterdam, Netherlands
基金
荷兰研究理事会;
关键词
Inference serving; DNN adaptation; Data adaptation; Dynamic edge networks; Dynamic DNNs; VIDEO;
D O I
10.1007/s11241-024-09418-4
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
While high accuracy is of paramount importance for deep learning (DL) inference, serving inference requests on time is equally critical but has not been carefully studied especially when the request has to be served over a dynamic wireless network at the edge. In this paper, we propose Jellyfish-a novel edge DL inference serving system that achieves soft guarantees for end-to-end inference latency service-level objectives (SLO). Jellyfish handles the network variability by utilizing both data and deep neural network (DNN) adaptation to conduct tradeoffs between accuracy and latency. Jellyfish features a new design that enables collective adaptation policies where the decisions for data and DNN adaptations are aligned and coordinated among multiple users with varying network conditions. We propose efficient algorithms to continuously map users and adapt DNNs at runtime, so that we fulfill latency SLOs while maximizing the overall inference accuracy. We further investigate dynamic DNNs, i.e., DNNs that encompass multiple architecture variants, and demonstrate their potential benefit through preliminary experiments. Our experiments based on a prototype implementation and real-world WiFi and LTE network traces show that Jellyfish can meet latency SLOs at around the 99th percentile while maintaining high accuracy.
引用
收藏
页码:239 / 290
页数:52
相关论文
共 50 条
  • [1] End-to-end over interplanetary networks
    Filman, RE
    IEEE INTERNET COMPUTING, 2003, 7 (05) : 4 - 5
  • [2] End-to-end entanglement establishment with lower latency in quantum networks
    Na Chen
    Qi Zhao
    Tianqi Dou
    Yuheng Xie
    Jianjun Tang
    Quantum Information Processing, 23
  • [3] Evaluation of End-to-End Latency for Segmented Bursts in OBS Networks
    Mutsvangwa, Andrew
    Nleya, Bakhe
    Gomba, Masimba
    Ngeama, Ndunga
    2016 THIRD INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND ENGINEERING (ICACCE 2016), 2016, : 99 - 102
  • [4] End-to-end entanglement establishment with lower latency in quantum networks
    Chen, Na
    Zhao, Qi
    Dou, Tianqi
    Xie, Yuheng
    Tang, Jianjun
    QUANTUM INFORMATION PROCESSING, 2024, 23 (02)
  • [5] Experimental Evaluation of End-to-end Flow Latency Reduction in Softwarized Cellular Networks through Dynamic Multi-Access Edge Computing
    Fondo-Ferreiro, Pablo
    Candal-Ventureira, David
    Gil-Castineira, Felipe
    Javier Gonzalez-Castano, Francisco
    Collins, Diarmuid
    2021 IEEE 32ND ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2021,
  • [6] VisualNet: An End-to-End Human Visual System Inspired Framework to Reduce Inference Latency of Deep Neural Networks
    Wang, Tianchen
    Zhang, Jiawei
    Xiong, Jinjun
    Bian, Song
    Yan, Zheyu
    Huang, Meiping
    Zhuang, Jian
    Sato, Takashi
    Xu, Xiaowei
    Shi, Yiyu
    IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (11) : 2717 - 2727
  • [7] A Stable Matching Based Algorithm to Minimize the End-to-End Latency of Edge NFV
    Ghai, Karanbir Singh
    Choudhury, Salimur
    Yassine, Abdulsalam
    10TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2019) / THE 2ND INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40 2019) / AFFILIATED WORKSHOPS, 2019, 151 : 377 - 384
  • [8] End-to-End Congestion Control to Provide Deterministic Latency Over Internet
    Liu, Jingling
    Huang, Jiawei
    Jiang, Wenchao
    Li, Zhaoyi
    Li, Yijun
    Lyu, Wenjun
    Jiang, Wanchun
    Zhang, Jiao
    Wang, Jianxin
    IEEE COMMUNICATIONS LETTERS, 2022, 26 (04) : 843 - 847
  • [9] Efficient algorithms to minimize the end-to-end latency of edge network function virtualization
    Ghai, Karanbir Singh
    Choudhury, Salimur
    Yassine, Abdulsalam
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (10) : 3963 - 3974
  • [10] Efficient algorithms to minimize the end-to-end latency of edge network function virtualization
    Karanbir Singh Ghai
    Salimur Choudhury
    Abdulsalam Yassine
    Journal of Ambient Intelligence and Humanized Computing, 2020, 11 : 3963 - 3974