Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs

被引:0
|
作者
Han, Lixiang [1 ]
Zhou, Zimu [2 ]
Li, Zhenjiang [1 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[2] City Univ Hong Kong, Sch Data Sci, Hong Kong, Peoples R China
关键词
Mobile Edge Systems; GPU Scheduling; Preemption; Deep Learning;
D O I
10.1145/3643832.3661878
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
GPUs are increasingly utilized for running DNN tasks on emerging mobile edge devices. Beyond accelerating single task inference, their value is also particularly apparent in efficiently executing multiple DNN tasks, which often have strict latency requirements in applications. Preemption is the main technology to ensure multitasking timeliness, but mobile edges primarily offer two priorities for task queues, and existing methods thus achieve only coarse-grained preemption by categorizing DNNs into real-time and best-effort, permitting a real-time task to preempt best-effort ones. However, the efficacy diminishes significantly when other real-time tasks run concurrently, but this is already common in mobile edge applications. Due to different hardware characteristics, solutions from other platforms are unsuitable. For instance, GPUs on traditional mobile devices primarily assist CPU processing and lack special preemption support, mainly following FIFO in GPU scheduling. Clouds handle concurrent task execution, but focus on allocating one or more GPUs per complex model, whereas on mobile edges, DNNs mainly vie for one GPU. This paper introduces Pantheon, designed to offer fine-grained preemption, enabling real-time tasks to preempt each other and best-effort tasks. Our key observation is that the two-tier GPU stream priorities, while underexplored, are sufficient. Efficient preemption can be realized through software design by innovative scheduling and novel exploitation of the nested redundancy principle for DNN models. Evaluation on a diverse set of DNNs shows substantial improvements in deadline miss rate and accuracy of Pantheon over state-of-the-art methods.
引用
收藏
页码:465 / 478
页数:14
相关论文
共 50 条
  • [31] EdgeSP: Scalable Multi-device Parallel DNN Inference on Heterogeneous Edge Clusters
    Gao, Zhipeng
    Sun, Shan
    Zhang, Yinghan
    Mo, Zijia
    Zhao, Chen
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT II, 2022, 13156 : 317 - 333
  • [32] eDeepSave: Saving DNN Inference using Early Exit During Handovers in Mobile Edge Environment
    Ju, Weiyu
    Yuan, Dong
    Bao, Wei
    Ge, Liming
    Zhou, Bing Bing
    ACM TRANSACTIONS ON SENSOR NETWORKS, 2021, 17 (03)
  • [33] Energy-Aware Inference Offloading for DNN-Driven Applications in Mobile Edge Clouds
    Xu, Zichuan
    Zhao, Liqian
    Liang, Weifa
    Rana, Omer F.
    Zhou, Pan
    Xia, Qiufen
    Xu, Wenzheng
    Wu, Guowei
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (04) : 799 - 814
  • [34] Advanced cover glass defect detection and classification based on multi-DNN model
    Park, Jisu
    Riaz, Hamza
    Kim, Hyunchul
    Kim, Jungsuk
    MANUFACTURING LETTERS, 2020, 23 : 53 - 61
  • [35] Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective
    Zeng, Shulin
    Dai, Guohao
    Zhang, Niansong
    Yang, Xinhao
    Zhang, Haoyu
    Zhu, Zhenhua
    Yang, Huazhong
    Wang, Yu
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (05) : 1314 - 1328
  • [36] Multi-Exit DNN Inference Acceleration Based on Multi-Dimensional Optimization for Edge Intelligence
    Dong, Fang
    Wang, Huitian
    Shen, Dian
    Huang, Zhaowu
    He, Qiang
    Zhang, Jinghui
    Wen, Liangsheng
    Zhang, Tingting
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (09) : 5389 - 5405
  • [37] Polyform: A Versatile Architecture for Multi-DNN Execution via Spatial and Temporal Acceleration
    Yin, Lingxiang
    Ghazizadeh, Amir
    Tian, Shilin
    Louri, Ahmed
    Zheng, Hao
    2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 166 - 169
  • [38] Temperature-Aware Sizing of Multi-Chip Module Accelerators for Multi-DNN Workloads
    Shukla, Prachi
    Aguren, Derrick
    Burd, Tom
    Coskun, Ayse K.
    Kalamatianos, John
    2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
  • [39] Edge intelligence in motion: Mobility-aware dynamic DNN inference service migration with downtime in mobile edge computing
    Wang, Pu
    Ouyang, Tao
    Liao, Guocheng
    Gong, Jie
    Yu, Shuai
    Chen, Xu
    Journal of Systems Architecture, 2022, 130
  • [40] Edge intelligence in motion: Mobility-aware dynamic DNN inference service migration with downtime in mobile edge computing
    Wang, Pu
    Ouyang, Tao
    Liao, Guocheng
    Gong, Jie
    Yu, Shuai
    Chen, Xu
    JOURNAL OF SYSTEMS ARCHITECTURE, 2022, 130