Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs

被引：0

作者：

Han, Lixiang ^{[1
]}

Zhou, Zimu ^{[2
]}

Li, Zhenjiang ^{[1
]}

机构：

[1] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China

[2] City Univ Hong Kong, Sch Data Sci, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE 2024 THE 22ND ANNUAL INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS, APPLICATIONS AND SERVICES, MOBISYS 2024 | 2024年

关键词：

Mobile Edge Systems; GPU Scheduling; Preemption; Deep Learning;

D O I：

10.1145/3643832.3661878

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

GPUs are increasingly utilized for running DNN tasks on emerging mobile edge devices. Beyond accelerating single task inference, their value is also particularly apparent in efficiently executing multiple DNN tasks, which often have strict latency requirements in applications. Preemption is the main technology to ensure multitasking timeliness, but mobile edges primarily offer two priorities for task queues, and existing methods thus achieve only coarse-grained preemption by categorizing DNNs into real-time and best-effort, permitting a real-time task to preempt best-effort ones. However, the efficacy diminishes significantly when other real-time tasks run concurrently, but this is already common in mobile edge applications. Due to different hardware characteristics, solutions from other platforms are unsuitable. For instance, GPUs on traditional mobile devices primarily assist CPU processing and lack special preemption support, mainly following FIFO in GPU scheduling. Clouds handle concurrent task execution, but focus on allocating one or more GPUs per complex model, whereas on mobile edges, DNNs mainly vie for one GPU. This paper introduces Pantheon, designed to offer fine-grained preemption, enabling real-time tasks to preempt each other and best-effort tasks. Our key observation is that the two-tier GPU stream priorities, while underexplored, are sufficient. Efficient preemption can be realized through software design by innovative scheduling and novel exploitation of the nested redundancy principle for DNN models. Evaluation on a diverse set of DNNs shows substantial improvements in deadline miss rate and accuracy of Pantheon over state-of-the-art methods.

引用

页码：465 / 478

页数：14

共 50 条

[21] Time-Constrained Adversarial Defense in IoT Edge Devices through Kernel Tensor Decomposition and Multi-DNN Scheduling
Kim, Myungsun
Joo, Sanghyun
SENSORS, 2022, 22 (15)
[22] Co-Optimization of DNN and Hardware Configurations on Edge GPUs
Bouzidi, Halima
Ouarnoughi, Hamza
Niar, Smail
Talbi, El-Ghazali
El Cadi, Abdessamad Ait
2022 25TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD), 2022, : 398 - 405
[23] DNN Surgery: Accelerating DNN Inference on the Edge Through Layer Partitioning
Liang, Huanghuang
Sang, Qianlong
Hu, Chuang
Cheng, Dazhao
Zhou, Xiaobo
Wang, Dan
Bao, Wei
Wang, Yu
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (03) : 3111 - 3125
[24] OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload
Karatzas, Andreas
Anagnostopoulos, Iraklis
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[25] A Survey on Collaborative DNN Inference for Edge Intelligence
Ren, Wei-Qing
Qu, Yu-Ben
Dong, Chao
Jing, Yu-Qian
Sun, Hao
Wu, Qi-Hui
Guo, Song
MACHINE INTELLIGENCE RESEARCH, 2023, 20 (03) : 370 - 395
[26] DNN Partitioning for Inference Throughput Acceleration at the Edge
Feltin, Thomas
Marcho, Leo
Cordero-Fuertes, Juan-Antonio
Brockners, Frank
Clausen, Thomas H.
IEEE ACCESS, 2023, 11 : 52236 - 52249
[27] A Survey on Collaborative DNN Inference for Edge Intelligence
Wei-Qing Ren
Yu-Ben Qu
Chao Dong
Yu-Qian Jing
Hao Sun
Qi-Hui Wu
Song Guo
Machine Intelligence Research, 2023, 20 : 370 - 395
[28] Distributed DNN Inference With Fine-Grained Model Partitioning in Mobile Edge Computing Networks
Li, Hui
Li, Xiuhua
Fan, Qilin
He, Qiang
Wang, Xiaofei
Leung, Victor C. M.
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (10) : 9060 - 9074
[29] Reliability-Aware Online Scheduling for DNN Inference Tasks in Mobile-Edge Computing
Ma, Huirong
Li, Rui
Zhang, Xiaoxi
Zhou, Zhi
Chen, Xu
IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (13) : 11453 - 11464
[30] Cutting-Edge Inference: Dynamic DNN Model Partitioning and Resource Scaling for Mobile AI
Lim, Jeong-A
Lee, Joohyun
Kwak, Jeongho
Kim, Yeongjin
IEEE Transactions on Services Computing, 2024, 17 (06): : 3300 - 3316

← 1 2 3 4 5 →