InferFair: Towards QoS-aware scheduling for performance isolation guarantee in heterogeneous model serving systems

被引:0
|
作者
Peng, Yaqiong [1 ]
Peng, Haocheng [1 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Lushan Rd, Changsha 410082, Hunan, Peoples R China
关键词
Deep Neural Network; Inference services; GPU scheduling; Quality of services; Performance isolation;
D O I
10.1016/j.future.2023.08.020
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the popularity of Deep Neural Network (DNN) models in diverse fields, DNN inference services have been widely deployed on cloud for resource-limited devices to support intelligent applications. Serving DNN inference often requires GPU acceleration to meet latency-sensitive interactive targets. A common approach to improving GPU utilization is to let multiple models share a GPU. However, this approach probably degrades both the responsiveness and throughput of model serving systems, due to that concurrent DNN inference tasks contend for GPU resources. In addition, interferences among heterogeneous DNN inference tasks probably incur performance isolation problem that heterogeneous models suffer from different levels of serving performance degradation. Existing works fail to ensure performance isolation among users of heterogeneous DNN models. To solve the aforementioned problem, we propose InferFair, a QoS-aware scheduling framework for ensuring performance isolation in heterogeneous model serving systems. InferFair focuses on two key designs: (1) periodically estimating effective throughput requirements of all active models online and (2) applying fine-grained adjustments to minimize the impact differences of GPU sharing on heterogeneous model services. We conduct intensive experiments on a variety of DNN models to demonstrate the effectiveness of InferFair. Compared to a prior competitor named Clockwork, InferFair alleviates the performance isolation problem by up to 1.7x, as well as improving the overall goodput by up to 25.6%.& COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页码:10 / 20
页数:11
相关论文
共 50 条
  • [41] QoS-aware scheduling with optimization of base-station power allocation in downlink cooperative OFDMA systems
    Xiao Zhang
    Xiaoming Tao
    Yang Li
    Jianhua Lu
    EURASIP Journal on Wireless Communications and Networking, 2013
  • [42] Towards a communication-aware task scheduling strategy for heterogeneous systems
    Orduña, JM
    Silla, F
    Duato, J
    COMPUTING AND INFORMATICS, 2001, 20 (03) : 245 - 267
  • [43] QoS-aware scheduling with optimization of base-station power allocation in downlink cooperative OFDMA systems
    Zhang, Xiao
    Tao, Xiaoming
    Li, Yang
    Lu, Jianhua
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2013,
  • [44] QoS-Aware Energy-Efficient Radio Resource Scheduling in Multi-User OFDMA Systems
    Xiao, Xiao
    Tao, Xiaoming
    Lu, Jianhua
    IEEE COMMUNICATIONS LETTERS, 2013, 17 (01) : 75 - 78
  • [45] QoS-Aware Performance Analysis of Full-Duplex RSMA Vehicle Road Cooperation Systems
    Li, Xingwang
    Wang, Xiaoyao
    Zhang, Hui
    Xu, Yongjun
    Yang, Liang
    Huang, Mengyan
    Hao, Wanming
    Huang, Gaojian
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (22): : 36053 - 36065
  • [46] CPS: Cross-interface network Partitioning and Scheduling towards QoS-aware data flow delivery in multimedia IoT
    Qin, Hua
    Chen, Weimin
    Li, Ni
    Wang, Tao
    Chen, Hao
    Yang, Gelan
    Peng, Yang
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2023, 217
  • [47] Delay-based and QoS-aware packet scheduling for RT and NRT multimedia services in LTE downlink systems
    Nadim K. M. Madi
    Zurina Mohd Hanapi
    Mohamed Othman
    Shamala K. Subramaniam
    EURASIP Journal on Wireless Communications and Networking, 2018
  • [48] Adaptive Performance Modeling Framework for QoS-Aware Offloading in MEC-Based IIoT Systems
    Bebortta, Sujit
    Senapati, Dilip
    Panigrahi, Chhabi Rani
    Pati, Bibudhendu
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (12) : 10162 - 10171
  • [49] A new cross layer approach to QoS-aware proportional fairness packet scheduling in the downlink of OFDM wireless systems
    Kong, Zhen
    Wang, Jiangzhou
    Kwok, Yu-Kwong
    2007 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-14, 2007, : 5695 - +
  • [50] Delay-based and QoS-aware packet scheduling for RT and NRT multimedia services in LTE downlink systems
    Madi, Nadim K. M.
    Hanapi, Zurina Mohd
    Othman, Mohamed
    Subramaniam, Shamala K.
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2018,