Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs

被引：0

作者：

Han, Lixiang ^{[1
]}

Zhou, Zimu ^{[2
]}

Li, Zhenjiang ^{[1
]}

机构：

[1] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China

[2] City Univ Hong Kong, Sch Data Sci, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE 2024 THE 22ND ANNUAL INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS, APPLICATIONS AND SERVICES, MOBISYS 2024 | 2024年

关键词：

Mobile Edge Systems; GPU Scheduling; Preemption; Deep Learning;

D O I：

10.1145/3643832.3661878

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

GPUs are increasingly utilized for running DNN tasks on emerging mobile edge devices. Beyond accelerating single task inference, their value is also particularly apparent in efficiently executing multiple DNN tasks, which often have strict latency requirements in applications. Preemption is the main technology to ensure multitasking timeliness, but mobile edges primarily offer two priorities for task queues, and existing methods thus achieve only coarse-grained preemption by categorizing DNNs into real-time and best-effort, permitting a real-time task to preempt best-effort ones. However, the efficacy diminishes significantly when other real-time tasks run concurrently, but this is already common in mobile edge applications. Due to different hardware characteristics, solutions from other platforms are unsuitable. For instance, GPUs on traditional mobile devices primarily assist CPU processing and lack special preemption support, mainly following FIFO in GPU scheduling. Clouds handle concurrent task execution, but focus on allocating one or more GPUs per complex model, whereas on mobile edges, DNNs mainly vie for one GPU. This paper introduces Pantheon, designed to offer fine-grained preemption, enabling real-time tasks to preempt each other and best-effort tasks. Our key observation is that the two-tier GPU stream priorities, while underexplored, are sufficient. Efficient preemption can be realized through software design by innovative scheduling and novel exploitation of the nested redundancy principle for DNN models. Evaluation on a diverse set of DNNs shows substantial improvements in deadline miss rate and accuracy of Pantheon over state-of-the-art methods.

引用

页码：465 / 478

页数：14

共 50 条

[1] MASA: Responsive Multi-DNN Inference on the Edge
Cox, Bart
Galjaard, Jeroen
Ghiassi, Amirmasoud
Birke, Robert
Chen, Lydia Y.
2021 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS (PERCOM), 2021,
[2] Artifact: MASA: Responsive Multi-DNN Inference on the Edge
Cox, Bart
Galjaard, Jeroen
Ghiassi, Amirmasoud
Birke, Robert
Chen, Lydia Y.
2021 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS (PERCOM WORKSHOPS), 2021, : 446 - 447
[3] Efficient CUDA stream management for multi-DNN real-time inference on embedded GPUs
Pang, Weiguang
Luo, Xiantong
Chen, Kailun
Ji, Dong
Qiao, Lei
Yi, Wang
JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 139
[4] Memory-aware and context-aware multi-DNN inference on the edge
Cox, Bart
Birke, Robert
Chen, Lydia Y.
PERVASIVE AND MOBILE COMPUTING, 2022, 83
[5] Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU
Zhao, Zhihe
Ling, Neiwen
Guan, Nan
Xing, Guoliang
PROCEEDINGS OF THE 21ST ACM CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS, SENSYS 2023, 2023, : 97 - 110
[6] Aaron: Compile-time Kernel Adaptation for Multi-DNN Inference Acceleration on Edge GPU
Zhao, Zhihe
Ling, Neiwen
Guan, Nan
Xing, Guoliang
PROCEEDINGS OF THE TWENTIETH ACM CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS, SENSYS 2022, 2022, : 802 - 803
[7] Efficient Single- and Multi-DNN Inference Using TensorRT Framework
Zhdanovskiy, Vyacheslav
Teplyakov, Lev
Belyaev, Philipp
SIXTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2023, 2024, 13072
[8] A Silicon Photonic Multi-DNN Accelerator
Li, Yuan
Louri, Ahmed
Karanth, Avinash
2023 32ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT, 2023, : 238 - 249
[9] gCFS: completely fair scheduling on multiple GPUs for improved multi-DNN execution in terms of performance isolation
Hojin Cho
Myungsun Kim
The Journal of Supercomputing, 2023, 79 : 5851 - 5877
[10] gCFS: completely fair scheduling on multiple GPUs for improved multi-DNN execution in terms of performance isolation
Cho, Hojin
Kim, Myungsun
JOURNAL OF SUPERCOMPUTING, 2023, 79 (06): : 5851 - 5877

← 1 2 3 4 5 →