Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs

被引:0
|
作者
Han, Lixiang [1 ]
Zhou, Zimu [2 ]
Li, Zhenjiang [1 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[2] City Univ Hong Kong, Sch Data Sci, Hong Kong, Peoples R China
关键词
Mobile Edge Systems; GPU Scheduling; Preemption; Deep Learning;
D O I
10.1145/3643832.3661878
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
GPUs are increasingly utilized for running DNN tasks on emerging mobile edge devices. Beyond accelerating single task inference, their value is also particularly apparent in efficiently executing multiple DNN tasks, which often have strict latency requirements in applications. Preemption is the main technology to ensure multitasking timeliness, but mobile edges primarily offer two priorities for task queues, and existing methods thus achieve only coarse-grained preemption by categorizing DNNs into real-time and best-effort, permitting a real-time task to preempt best-effort ones. However, the efficacy diminishes significantly when other real-time tasks run concurrently, but this is already common in mobile edge applications. Due to different hardware characteristics, solutions from other platforms are unsuitable. For instance, GPUs on traditional mobile devices primarily assist CPU processing and lack special preemption support, mainly following FIFO in GPU scheduling. Clouds handle concurrent task execution, but focus on allocating one or more GPUs per complex model, whereas on mobile edges, DNNs mainly vie for one GPU. This paper introduces Pantheon, designed to offer fine-grained preemption, enabling real-time tasks to preempt each other and best-effort tasks. Our key observation is that the two-tier GPU stream priorities, while underexplored, are sufficient. Efficient preemption can be realized through software design by innovative scheduling and novel exploitation of the nested redundancy principle for DNN models. Evaluation on a diverse set of DNNs shows substantial improvements in deadline miss rate and accuracy of Pantheon over state-of-the-art methods.
引用
收藏
页码:465 / 478
页数:14
相关论文
共 50 条
  • [21] Time-Constrained Adversarial Defense in IoT Edge Devices through Kernel Tensor Decomposition and Multi-DNN Scheduling
    Kim, Myungsun
    Joo, Sanghyun
    SENSORS, 2022, 22 (15)
  • [22] Co-Optimization of DNN and Hardware Configurations on Edge GPUs
    Bouzidi, Halima
    Ouarnoughi, Hamza
    Niar, Smail
    Talbi, El-Ghazali
    El Cadi, Abdessamad Ait
    2022 25TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD), 2022, : 398 - 405
  • [23] DNN Surgery: Accelerating DNN Inference on the Edge Through Layer Partitioning
    Liang, Huanghuang
    Sang, Qianlong
    Hu, Chuang
    Cheng, Dazhao
    Zhou, Xiaobo
    Wang, Dan
    Bao, Wei
    Wang, Yu
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (03) : 3111 - 3125
  • [24] OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload
    Karatzas, Andreas
    Anagnostopoulos, Iraklis
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [25] A Survey on Collaborative DNN Inference for Edge Intelligence
    Ren, Wei-Qing
    Qu, Yu-Ben
    Dong, Chao
    Jing, Yu-Qian
    Sun, Hao
    Wu, Qi-Hui
    Guo, Song
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (03) : 370 - 395
  • [26] DNN Partitioning for Inference Throughput Acceleration at the Edge
    Feltin, Thomas
    Marcho, Leo
    Cordero-Fuertes, Juan-Antonio
    Brockners, Frank
    Clausen, Thomas H.
    IEEE ACCESS, 2023, 11 : 52236 - 52249
  • [27] A Survey on Collaborative DNN Inference for Edge Intelligence
    Wei-Qing Ren
    Yu-Ben Qu
    Chao Dong
    Yu-Qian Jing
    Hao Sun
    Qi-Hui Wu
    Song Guo
    Machine Intelligence Research, 2023, 20 : 370 - 395
  • [28] Distributed DNN Inference With Fine-Grained Model Partitioning in Mobile Edge Computing Networks
    Li, Hui
    Li, Xiuhua
    Fan, Qilin
    He, Qiang
    Wang, Xiaofei
    Leung, Victor C. M.
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (10) : 9060 - 9074
  • [29] Reliability-Aware Online Scheduling for DNN Inference Tasks in Mobile-Edge Computing
    Ma, Huirong
    Li, Rui
    Zhang, Xiaoxi
    Zhou, Zhi
    Chen, Xu
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (13) : 11453 - 11464
  • [30] Cutting-Edge Inference: Dynamic DNN Model Partitioning and Resource Scaling for Mobile AI
    Lim, Jeong-A
    Lee, Joohyun
    Kwak, Jeongho
    Kim, Yeongjin
    IEEE Transactions on Services Computing, 2024, 17 (06): : 3300 - 3316