Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs

被引：0

作者：

Han, Lixiang ^{[1
]}

Zhou, Zimu ^{[2
]}

Li, Zhenjiang ^{[1
]}

机构：

[1] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China

[2] City Univ Hong Kong, Sch Data Sci, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE 2024 THE 22ND ANNUAL INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS, APPLICATIONS AND SERVICES, MOBISYS 2024 | 2024年

关键词：

Mobile Edge Systems; GPU Scheduling; Preemption; Deep Learning;

D O I：

10.1145/3643832.3661878

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

GPUs are increasingly utilized for running DNN tasks on emerging mobile edge devices. Beyond accelerating single task inference, their value is also particularly apparent in efficiently executing multiple DNN tasks, which often have strict latency requirements in applications. Preemption is the main technology to ensure multitasking timeliness, but mobile edges primarily offer two priorities for task queues, and existing methods thus achieve only coarse-grained preemption by categorizing DNNs into real-time and best-effort, permitting a real-time task to preempt best-effort ones. However, the efficacy diminishes significantly when other real-time tasks run concurrently, but this is already common in mobile edge applications. Due to different hardware characteristics, solutions from other platforms are unsuitable. For instance, GPUs on traditional mobile devices primarily assist CPU processing and lack special preemption support, mainly following FIFO in GPU scheduling. Clouds handle concurrent task execution, but focus on allocating one or more GPUs per complex model, whereas on mobile edges, DNNs mainly vie for one GPU. This paper introduces Pantheon, designed to offer fine-grained preemption, enabling real-time tasks to preempt each other and best-effort tasks. Our key observation is that the two-tier GPU stream priorities, while underexplored, are sufficient. Efficient preemption can be realized through software design by innovative scheduling and novel exploitation of the nested redundancy principle for DNN models. Evaluation on a diverse set of DNNs shows substantial improvements in deadline miss rate and accuracy of Pantheon over state-of-the-art methods.

引用

页码：465 / 478

页数：14

共 50 条

[41] Elastic DNN Inference With Unpredictable Exit in Edge Computing
Huang, Jiaming
Gao, Yi
Dong, Wei
IEEE Transactions on Mobile Computing, 2024, 23 (12) : 14005 - 14016
[42] Versa-DNN: A Versatile Architecture Enabling High-Performance and Energy-Efficient Multi-DNN Acceleration
Yang, Jiaqi
Zheng, Hao
Louri, Ahmed
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (02) : 349 - 361
[43] Accelerating DNN Inference by Edge-Cloud Collaboration
Chen, Jianan
Qi, Qi
Wang, Jingyu
Sun, Haifeng
Liao, Jianxin
2021 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE (IPCCC), 2021,
[44] Distributed Assignment With Load Balancing for DNN Inference at the Edge
Xu, Yuzhe
Mohammed, Thaha
Di Francesco, Mario
Fischione, Carlo
IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (02): : 1053 - 1065
[45] DeepSave: Saving DNN Inference During Handovers on the Edge
Ju, Weiyu
Yuan, Dong
Bao, Wei
Ge, Liming
Zhou, Bing Bing
SEC'19: PROCEEDINGS OF THE 4TH ACM/IEEE SYMPOSIUM ON EDGE COMPUTING, 2019, : 166 - 178
[46] Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge
Hu, Chuang
Bao, Wei
Wang, Dan
Liu, Fengming
IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019), 2019, : 1423 - 1431
[47] Optimum Selection of DNN Model and Framework for Edge Inference
Velasco-Montero, Delia
Fernandez-Berni, Jorge
Carmona-Galan, Ricardo
Rodriguez-Vazquez, Angel
IEEE ACCESS, 2018, 6 : 51680 - 51692
[48] Elastic DNN Inference with Unpredictable Exit in Edge Computing
Huang, Jiaming
Gao, Yi
Dong, Wei
2023 IEEE 43RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS, 2023, : 293 - 304
[49] A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN Acceleration
Li, Yuan
Louri, Ahmed
Karanth, Avinash
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (01) : 46 - 58
[50] Joint multi-user DNN partitioning and task offloading in mobile edge computing
Liao, Zhuofan
Hu, Weibo
Huang, Jiawei
Wang, Jianxin
AD HOC NETWORKS, 2023, 144

← 1 2 3 4 5 →