Inference serving with end-to-end latency SLOs over dynamic edge networks

被引：1

作者：

Nigade, Vinod ^{[1
]}

Bauszat, Pablo ^{[1
]}

Bal, Henri ^{[1
]}

Wang, Lin ^{[1
]}

机构：

[1] Vrije Univ Amsterdam, Amsterdam, Netherlands

来源：

REAL-TIME SYSTEMS | 2024年 / 60卷 / 02期

基金：

荷兰研究理事会;

关键词：

Inference serving; DNN adaptation; Data adaptation; Dynamic edge networks; Dynamic DNNs; VIDEO;

D O I：

10.1007/s11241-024-09418-4

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

While high accuracy is of paramount importance for deep learning (DL) inference, serving inference requests on time is equally critical but has not been carefully studied especially when the request has to be served over a dynamic wireless network at the edge. In this paper, we propose Jellyfish-a novel edge DL inference serving system that achieves soft guarantees for end-to-end inference latency service-level objectives (SLO). Jellyfish handles the network variability by utilizing both data and deep neural network (DNN) adaptation to conduct tradeoffs between accuracy and latency. Jellyfish features a new design that enables collective adaptation policies where the decisions for data and DNN adaptations are aligned and coordinated among multiple users with varying network conditions. We propose efficient algorithms to continuously map users and adapt DNNs at runtime, so that we fulfill latency SLOs while maximizing the overall inference accuracy. We further investigate dynamic DNNs, i.e., DNNs that encompass multiple architecture variants, and demonstrate their potential benefit through preliminary experiments. Our experiments based on a prototype implementation and real-world WiFi and LTE network traces show that Jellyfish can meet latency SLOs at around the 99th percentile while maintaining high accuracy.

引用

页码：239 / 290

页数：52

共 50 条

[1] End-to-end over interplanetary networks
Filman, RE
IEEE INTERNET COMPUTING, 2003, 7 (05) : 4 - 5
[2] End-to-end entanglement establishment with lower latency in quantum networks
Na Chen
Qi Zhao
Tianqi Dou
Yuheng Xie
Jianjun Tang
Quantum Information Processing, 23
[3] Evaluation of End-to-End Latency for Segmented Bursts in OBS Networks
Mutsvangwa, Andrew
Nleya, Bakhe
Gomba, Masimba
Ngeama, Ndunga
2016 THIRD INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND ENGINEERING (ICACCE 2016), 2016, : 99 - 102
[4] End-to-end entanglement establishment with lower latency in quantum networks
Chen, Na
Zhao, Qi
Dou, Tianqi
Xie, Yuheng
Tang, Jianjun
QUANTUM INFORMATION PROCESSING, 2024, 23 (02)
[5] Experimental Evaluation of End-to-end Flow Latency Reduction in Softwarized Cellular Networks through Dynamic Multi-Access Edge Computing
Fondo-Ferreiro, Pablo
Candal-Ventureira, David
Gil-Castineira, Felipe
Javier Gonzalez-Castano, Francisco
Collins, Diarmuid
2021 IEEE 32ND ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2021,
[6] VisualNet: An End-to-End Human Visual System Inspired Framework to Reduce Inference Latency of Deep Neural Networks
Wang, Tianchen
Zhang, Jiawei
Xiong, Jinjun
Bian, Song
Yan, Zheyu
Huang, Meiping
Zhuang, Jian
Sato, Takashi
Xu, Xiaowei
Shi, Yiyu
IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (11) : 2717 - 2727
[7] A Stable Matching Based Algorithm to Minimize the End-to-End Latency of Edge NFV
Ghai, Karanbir Singh
Choudhury, Salimur
Yassine, Abdulsalam
10TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2019) / THE 2ND INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40 2019) / AFFILIATED WORKSHOPS, 2019, 151 : 377 - 384
[8] End-to-End Congestion Control to Provide Deterministic Latency Over Internet
Liu, Jingling
Huang, Jiawei
Jiang, Wenchao
Li, Zhaoyi
Li, Yijun
Lyu, Wenjun
Jiang, Wanchun
Zhang, Jiao
Wang, Jianxin
IEEE COMMUNICATIONS LETTERS, 2022, 26 (04) : 843 - 847
[9] Efficient algorithms to minimize the end-to-end latency of edge network function virtualization
Ghai, Karanbir Singh
Choudhury, Salimur
Yassine, Abdulsalam
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (10) : 3963 - 3974
[10] Efficient algorithms to minimize the end-to-end latency of edge network function virtualization
Karanbir Singh Ghai
Salimur Choudhury
Abdulsalam Yassine
Journal of Ambient Intelligence and Humanized Computing, 2020, 11 : 3963 - 3974

← 1 2 3 4 5 →