Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs

被引:0
|
作者
Han, Lixiang [1 ]
Zhou, Zimu [2 ]
Li, Zhenjiang [1 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[2] City Univ Hong Kong, Sch Data Sci, Hong Kong, Peoples R China
关键词
Mobile Edge Systems; GPU Scheduling; Preemption; Deep Learning;
D O I
10.1145/3643832.3661878
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
GPUs are increasingly utilized for running DNN tasks on emerging mobile edge devices. Beyond accelerating single task inference, their value is also particularly apparent in efficiently executing multiple DNN tasks, which often have strict latency requirements in applications. Preemption is the main technology to ensure multitasking timeliness, but mobile edges primarily offer two priorities for task queues, and existing methods thus achieve only coarse-grained preemption by categorizing DNNs into real-time and best-effort, permitting a real-time task to preempt best-effort ones. However, the efficacy diminishes significantly when other real-time tasks run concurrently, but this is already common in mobile edge applications. Due to different hardware characteristics, solutions from other platforms are unsuitable. For instance, GPUs on traditional mobile devices primarily assist CPU processing and lack special preemption support, mainly following FIFO in GPU scheduling. Clouds handle concurrent task execution, but focus on allocating one or more GPUs per complex model, whereas on mobile edges, DNNs mainly vie for one GPU. This paper introduces Pantheon, designed to offer fine-grained preemption, enabling real-time tasks to preempt each other and best-effort tasks. Our key observation is that the two-tier GPU stream priorities, while underexplored, are sufficient. Efficient preemption can be realized through software design by innovative scheduling and novel exploitation of the nested redundancy principle for DNN models. Evaluation on a diverse set of DNNs shows substantial improvements in deadline miss rate and accuracy of Pantheon over state-of-the-art methods.
引用
收藏
页码:465 / 478
页数:14
相关论文
共 50 条
  • [1] MASA: Responsive Multi-DNN Inference on the Edge
    Cox, Bart
    Galjaard, Jeroen
    Ghiassi, Amirmasoud
    Birke, Robert
    Chen, Lydia Y.
    2021 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS (PERCOM), 2021,
  • [2] Artifact: MASA: Responsive Multi-DNN Inference on the Edge
    Cox, Bart
    Galjaard, Jeroen
    Ghiassi, Amirmasoud
    Birke, Robert
    Chen, Lydia Y.
    2021 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS (PERCOM WORKSHOPS), 2021, : 446 - 447
  • [3] Efficient CUDA stream management for multi-DNN real-time inference on embedded GPUs
    Pang, Weiguang
    Luo, Xiantong
    Chen, Kailun
    Ji, Dong
    Qiao, Lei
    Yi, Wang
    JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 139
  • [4] Memory-aware and context-aware multi-DNN inference on the edge
    Cox, Bart
    Birke, Robert
    Chen, Lydia Y.
    PERVASIVE AND MOBILE COMPUTING, 2022, 83
  • [5] Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU
    Zhao, Zhihe
    Ling, Neiwen
    Guan, Nan
    Xing, Guoliang
    PROCEEDINGS OF THE 21ST ACM CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS, SENSYS 2023, 2023, : 97 - 110
  • [6] Aaron: Compile-time Kernel Adaptation for Multi-DNN Inference Acceleration on Edge GPU
    Zhao, Zhihe
    Ling, Neiwen
    Guan, Nan
    Xing, Guoliang
    PROCEEDINGS OF THE TWENTIETH ACM CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS, SENSYS 2022, 2022, : 802 - 803
  • [7] Efficient Single- and Multi-DNN Inference Using TensorRT Framework
    Zhdanovskiy, Vyacheslav
    Teplyakov, Lev
    Belyaev, Philipp
    SIXTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2023, 2024, 13072
  • [8] A Silicon Photonic Multi-DNN Accelerator
    Li, Yuan
    Louri, Ahmed
    Karanth, Avinash
    2023 32ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT, 2023, : 238 - 249
  • [9] gCFS: completely fair scheduling on multiple GPUs for improved multi-DNN execution in terms of performance isolation
    Hojin Cho
    Myungsun Kim
    The Journal of Supercomputing, 2023, 79 : 5851 - 5877
  • [10] gCFS: completely fair scheduling on multiple GPUs for improved multi-DNN execution in terms of performance isolation
    Cho, Hojin
    Kim, Myungsun
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (06): : 5851 - 5877