InferFair: Towards QoS-aware scheduling for performance isolation guarantee in heterogeneous model serving systems

被引:0
|
作者
Peng, Yaqiong [1 ]
Peng, Haocheng [1 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Lushan Rd, Changsha 410082, Hunan, Peoples R China
关键词
Deep Neural Network; Inference services; GPU scheduling; Quality of services; Performance isolation;
D O I
10.1016/j.future.2023.08.020
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the popularity of Deep Neural Network (DNN) models in diverse fields, DNN inference services have been widely deployed on cloud for resource-limited devices to support intelligent applications. Serving DNN inference often requires GPU acceleration to meet latency-sensitive interactive targets. A common approach to improving GPU utilization is to let multiple models share a GPU. However, this approach probably degrades both the responsiveness and throughput of model serving systems, due to that concurrent DNN inference tasks contend for GPU resources. In addition, interferences among heterogeneous DNN inference tasks probably incur performance isolation problem that heterogeneous models suffer from different levels of serving performance degradation. Existing works fail to ensure performance isolation among users of heterogeneous DNN models. To solve the aforementioned problem, we propose InferFair, a QoS-aware scheduling framework for ensuring performance isolation in heterogeneous model serving systems. InferFair focuses on two key designs: (1) periodically estimating effective throughput requirements of all active models online and (2) applying fine-grained adjustments to minimize the impact differences of GPU sharing on heterogeneous model services. We conduct intensive experiments on a variety of DNN models to demonstrate the effectiveness of InferFair. Compared to a prior competitor named Clockwork, InferFair alleviates the performance isolation problem by up to 1.7x, as well as improving the overall goodput by up to 25.6%.& COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页码:10 / 20
页数:11
相关论文
共 50 条
  • [1] Performance evaluation of QoS-aware heterogeneous systems
    Skianis, C.
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2008, 191 (03) : 1056 - 1058
  • [2] QoS-Aware Scheduling in Heterogeneous Datacenters with Paragon
    Delimitrou, Christina
    Kozyrakis, Christos
    ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2013, 31 (04):
  • [3] Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters
    Delimitrou, Christina
    Kozyrakis, Christos
    ACM SIGPLAN NOTICES, 2013, 48 (04) : 77 - 88
  • [4] Towards QoS-aware Load Distribution In Heterogeneous Networks
    Niephaus, Christian
    Kretschmer, Mathias
    Ghinea, Gheorghita
    2013 IEEE MALAYSIA INTERNATIONAL CONFERENCE ON COMMUNICATIONS (MICC), 2013, : 151 - 156
  • [5] Towards QoS-Aware Function Composition Scheduling in Apache OpenWhisk
    Russo, Gabriele Russo
    Milani, Alfredo
    Iannuccit, Stefano
    Cardellini, Valeria
    2022 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS (PERCOM WORKSHOPS), 2022,
  • [6] QoS-Aware Scheduling of Heterogeneous Servers for Inference in Deep Neural Networks
    Fang, Zhou
    Yu, Tong
    Mengshoel, Ole J.
    Gupta, Rajesh K.
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2067 - 2070
  • [7] QoS-Aware Data Report Scheduling in Heterogeneous Wireless Sensor Networks
    Choe, Hyun Jung
    2009 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS (PERCOM), VOLS 1 AND 2, 2009, : 417 - 418
  • [8] Low-Complexity QoS-Aware Coordinated Scheduling for Heterogeneous Networks
    Zhu, Jun
    Yang, Hong-Chuan
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2017, 66 (07) : 6596 - 6601
  • [9] Application-Specific and QoS-Aware Scheduling for Wireless Systems
    He, Chao
    Gitlin, Richard D.
    2014 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR, AND MOBILE RADIO COMMUNICATION (PIMRC), 2014, : 1147 - 1151
  • [10] Towards QoS-Aware Scheduling in Software-Defined Storage Networks
    Zeydan, Engin
    Narmanlioglu, Omer
    2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,