QoS-Aware Scheduling of Heterogeneous Servers for Inference in Deep Neural Networks

被引:24
|
作者
Fang, Zhou [1 ]
Yu, Tong [2 ]
Mengshoel, Ole J. [2 ]
Gupta, Rajesh K. [1 ]
机构
[1] Univ Calif San Diego, La Jolla, CA 92093 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
D O I
10.1145/3132847.3133045
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural networks (DNNs) are popular in diverse fields such as computer vision and natural language processing. DNN inference tasks are emerging as a service provided by cloud computing environments. However, cloud-hosted DNN inference faces new challenges in workload scheduling for the best Quality of Service (QoS), due to dependence on batch size, model complexity and resource allocation. This paper represents the QoS metric as a utility function of response delay and inference accuracy. We first propose a simple and effective heuristic approach that keeps low response delay and satisfies the requirement on processing throughput. Then we describe an advanced deep reinforcement learning (RL) approach that learns to schedule from experience. The RL scheduler is trained to maximize QoS, using a set of system statuses as the input to the RL policy model. Our approach performs scheduling actions only when there are free GPUs, thus reduces scheduling overhead over common RL schedulers that run at every continuous time step. We evaluate the schedulers on a simulation platform and demonstrate the advantages of RL over heuristics.
引用
收藏
页码:2067 / 2070
页数:4
相关论文
共 50 条
  • [1] Kalmia: A Heterogeneous QoS-aware Scheduling Framework for DNN Tasks on Edge Servers
    Fu, Ziyan
    Ren, Ju
    Zhang, Deyu
    Zhou, Yuezhi
    Zhang, Yaoxue
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2022), 2022, : 780 - 789
  • [2] QoS-Aware Scheduling in Heterogeneous Datacenters with Paragon
    Delimitrou, Christina
    Kozyrakis, Christos
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2013, 31 (04):
  • [3] Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters
    Delimitrou, Christina
    Kozyrakis, Christos
    [J]. ACM SIGPLAN NOTICES, 2013, 48 (04) : 77 - 88
  • [4] Low-Complexity QoS-Aware Coordinated Scheduling for Heterogeneous Networks
    Zhu, Jun
    Yang, Hong-Chuan
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2017, 66 (07) : 6596 - 6601
  • [5] QoS-Aware Data Report Scheduling in Heterogeneous Wireless Sensor Networks
    Choe, Hyun Jung
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS (PERCOM), VOLS 1 AND 2, 2009, : 417 - 418
  • [6] QoS-Aware Inference Acceleration Using Adaptive Depth Neural Networks
    Kang, Woochul
    [J]. IEEE ACCESS, 2024, 12 : 49329 - 49340
  • [7] A QoS-Aware Uplink Scheduling Paradigm for LTE Networks
    Safa, Haidar
    El-Hajj, Wassim
    Tohme, Kamal
    [J]. 2013 IEEE 27TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2013, : 1097 - 1104
  • [8] QoS-aware downlink packet scheduling for LTE networks
    Lai, Wei Kuang
    Tang, Chang-Lung
    [J]. COMPUTER NETWORKS, 2013, 57 (07) : 1689 - 1698
  • [9] QoS-aware routing in emerging heterogeneous wireless networks
    Yang, Kun
    Wu, Yumin
    Chen, Hsiao-Hwa
    [J]. IEEE COMMUNICATIONS MAGAZINE, 2007, 45 (02) : 74 - 80
  • [10] Using adaptive priority scheduling for service differentiation in QoS-aware web servers
    Teixeira, MM
    Santana, MJ
    Santana, RHC
    [J]. CONFERENCE PROCEEDINGS OF THE 2004 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE, 2004, : 279 - 285