Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs

被引:0
|
作者
Han, Lixiang [1 ]
Zhou, Zimu [2 ]
Li, Zhenjiang [1 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[2] City Univ Hong Kong, Sch Data Sci, Hong Kong, Peoples R China
关键词
Mobile Edge Systems; GPU Scheduling; Preemption; Deep Learning;
D O I
10.1145/3643832.3661878
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
GPUs are increasingly utilized for running DNN tasks on emerging mobile edge devices. Beyond accelerating single task inference, their value is also particularly apparent in efficiently executing multiple DNN tasks, which often have strict latency requirements in applications. Preemption is the main technology to ensure multitasking timeliness, but mobile edges primarily offer two priorities for task queues, and existing methods thus achieve only coarse-grained preemption by categorizing DNNs into real-time and best-effort, permitting a real-time task to preempt best-effort ones. However, the efficacy diminishes significantly when other real-time tasks run concurrently, but this is already common in mobile edge applications. Due to different hardware characteristics, solutions from other platforms are unsuitable. For instance, GPUs on traditional mobile devices primarily assist CPU processing and lack special preemption support, mainly following FIFO in GPU scheduling. Clouds handle concurrent task execution, but focus on allocating one or more GPUs per complex model, whereas on mobile edges, DNNs mainly vie for one GPU. This paper introduces Pantheon, designed to offer fine-grained preemption, enabling real-time tasks to preempt each other and best-effort tasks. Our key observation is that the two-tier GPU stream priorities, while underexplored, are sufficient. Efficient preemption can be realized through software design by innovative scheduling and novel exploitation of the nested redundancy principle for DNN models. Evaluation on a diverse set of DNNs shows substantial improvements in deadline miss rate and accuracy of Pantheon over state-of-the-art methods.
引用
收藏
页码:465 / 478
页数:14
相关论文
共 50 条
  • [41] Elastic DNN Inference With Unpredictable Exit in Edge Computing
    Huang, Jiaming
    Gao, Yi
    Dong, Wei
    IEEE Transactions on Mobile Computing, 2024, 23 (12) : 14005 - 14016
  • [42] Versa-DNN: A Versatile Architecture Enabling High-Performance and Energy-Efficient Multi-DNN Acceleration
    Yang, Jiaqi
    Zheng, Hao
    Louri, Ahmed
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (02) : 349 - 361
  • [43] Accelerating DNN Inference by Edge-Cloud Collaboration
    Chen, Jianan
    Qi, Qi
    Wang, Jingyu
    Sun, Haifeng
    Liao, Jianxin
    2021 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE (IPCCC), 2021,
  • [44] Distributed Assignment With Load Balancing for DNN Inference at the Edge
    Xu, Yuzhe
    Mohammed, Thaha
    Di Francesco, Mario
    Fischione, Carlo
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (02): : 1053 - 1065
  • [45] DeepSave: Saving DNN Inference During Handovers on the Edge
    Ju, Weiyu
    Yuan, Dong
    Bao, Wei
    Ge, Liming
    Zhou, Bing Bing
    SEC'19: PROCEEDINGS OF THE 4TH ACM/IEEE SYMPOSIUM ON EDGE COMPUTING, 2019, : 166 - 178
  • [46] Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge
    Hu, Chuang
    Bao, Wei
    Wang, Dan
    Liu, Fengming
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019), 2019, : 1423 - 1431
  • [47] Optimum Selection of DNN Model and Framework for Edge Inference
    Velasco-Montero, Delia
    Fernandez-Berni, Jorge
    Carmona-Galan, Ricardo
    Rodriguez-Vazquez, Angel
    IEEE ACCESS, 2018, 6 : 51680 - 51692
  • [48] Elastic DNN Inference with Unpredictable Exit in Edge Computing
    Huang, Jiaming
    Gao, Yi
    Dong, Wei
    2023 IEEE 43RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS, 2023, : 293 - 304
  • [49] A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN Acceleration
    Li, Yuan
    Louri, Ahmed
    Karanth, Avinash
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (01) : 46 - 58
  • [50] Joint multi-user DNN partitioning and task offloading in mobile edge computing
    Liao, Zhuofan
    Hu, Weibo
    Huang, Jiawei
    Wang, Jianxin
    AD HOC NETWORKS, 2023, 144