Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs

被引：0

作者：

Han, Lixiang ^{[1
]}

Zhou, Zimu ^{[2
]}

Li, Zhenjiang ^{[1
]}

机构：

[1] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China

[2] City Univ Hong Kong, Sch Data Sci, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE 2024 THE 22ND ANNUAL INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS, APPLICATIONS AND SERVICES, MOBISYS 2024 | 2024年

关键词：

Mobile Edge Systems; GPU Scheduling; Preemption; Deep Learning;

D O I：

10.1145/3643832.3661878

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

GPUs are increasingly utilized for running DNN tasks on emerging mobile edge devices. Beyond accelerating single task inference, their value is also particularly apparent in efficiently executing multiple DNN tasks, which often have strict latency requirements in applications. Preemption is the main technology to ensure multitasking timeliness, but mobile edges primarily offer two priorities for task queues, and existing methods thus achieve only coarse-grained preemption by categorizing DNNs into real-time and best-effort, permitting a real-time task to preempt best-effort ones. However, the efficacy diminishes significantly when other real-time tasks run concurrently, but this is already common in mobile edge applications. Due to different hardware characteristics, solutions from other platforms are unsuitable. For instance, GPUs on traditional mobile devices primarily assist CPU processing and lack special preemption support, mainly following FIFO in GPU scheduling. Clouds handle concurrent task execution, but focus on allocating one or more GPUs per complex model, whereas on mobile edges, DNNs mainly vie for one GPU. This paper introduces Pantheon, designed to offer fine-grained preemption, enabling real-time tasks to preempt each other and best-effort tasks. Our key observation is that the two-tier GPU stream priorities, while underexplored, are sufficient. Efficient preemption can be realized through software design by innovative scheduling and novel exploitation of the nested redundancy principle for DNN models. Evaluation on a diverse set of DNNs shows substantial improvements in deadline miss rate and accuracy of Pantheon over state-of-the-art methods.

引用

页码：465 / 478

页数：14

共 50 条

[31] EdgeSP: Scalable Multi-device Parallel DNN Inference on Heterogeneous Edge Clusters
Gao, Zhipeng
Sun, Shan
Zhang, Yinghan
Mo, Zijia
Zhao, Chen
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT II, 2022, 13156 : 317 - 333
[32] eDeepSave: Saving DNN Inference using Early Exit During Handovers in Mobile Edge Environment
Ju, Weiyu
Yuan, Dong
Bao, Wei
Ge, Liming
Zhou, Bing Bing
ACM TRANSACTIONS ON SENSOR NETWORKS, 2021, 17 (03)
[33] Energy-Aware Inference Offloading for DNN-Driven Applications in Mobile Edge Clouds
Xu, Zichuan
Zhao, Liqian
Liang, Weifa
Rana, Omer F.
Zhou, Pan
Xia, Qiufen
Xu, Wenzheng
Wu, Guowei
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (04) : 799 - 814
[34] Advanced cover glass defect detection and classification based on multi-DNN model
Park, Jisu
Riaz, Hamza
Kim, Hyunchul
Kim, Jungsuk
MANUFACTURING LETTERS, 2020, 23 : 53 - 61
[35] Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective
Zeng, Shulin
Dai, Guohao
Zhang, Niansong
Yang, Xinhao
Zhang, Haoyu
Zhu, Zhenhua
Yang, Huazhong
Wang, Yu
IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (05) : 1314 - 1328
[36] Multi-Exit DNN Inference Acceleration Based on Multi-Dimensional Optimization for Edge Intelligence
Dong, Fang
Wang, Huitian
Shen, Dian
Huang, Zhaowu
He, Qiang
Zhang, Jinghui
Wen, Liangsheng
Zhang, Tingting
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (09) : 5389 - 5405
[37] Polyform: A Versatile Architecture for Multi-DNN Execution via Spatial and Temporal Acceleration
Yin, Lingxiang
Ghazizadeh, Amir
Tian, Shilin
Louri, Ahmed
Zheng, Hao
2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 166 - 169
[38] Temperature-Aware Sizing of Multi-Chip Module Accelerators for Multi-DNN Workloads
Shukla, Prachi
Aguren, Derrick
Burd, Tom
Coskun, Ayse K.
Kalamatianos, John
2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
[39] Edge intelligence in motion: Mobility-aware dynamic DNN inference service migration with downtime in mobile edge computing
Wang, Pu
Ouyang, Tao
Liao, Guocheng
Gong, Jie
Yu, Shuai
Chen, Xu
Journal of Systems Architecture, 2022, 130
[40] Edge intelligence in motion: Mobility-aware dynamic DNN inference service migration with downtime in mobile edge computing
Wang, Pu
Ouyang, Tao
Liao, Guocheng
Gong, Jie
Yu, Shuai
Chen, Xu
JOURNAL OF SYSTEMS ARCHITECTURE, 2022, 130

← 1 2 3 4 5 →