InferFair: Towards QoS-aware scheduling for performance isolation guarantee in heterogeneous model serving systems

被引:0
|
作者
Peng, Yaqiong [1 ]
Peng, Haocheng [1 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Lushan Rd, Changsha 410082, Hunan, Peoples R China
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2024年 / 150卷
关键词
Deep Neural Network; Inference services; GPU scheduling; Quality of services; Performance isolation;
D O I
10.1016/j.future.2023.08.020
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the popularity of Deep Neural Network (DNN) models in diverse fields, DNN inference services have been widely deployed on cloud for resource-limited devices to support intelligent applications. Serving DNN inference often requires GPU acceleration to meet latency-sensitive interactive targets. A common approach to improving GPU utilization is to let multiple models share a GPU. However, this approach probably degrades both the responsiveness and throughput of model serving systems, due to that concurrent DNN inference tasks contend for GPU resources. In addition, interferences among heterogeneous DNN inference tasks probably incur performance isolation problem that heterogeneous models suffer from different levels of serving performance degradation. Existing works fail to ensure performance isolation among users of heterogeneous DNN models. To solve the aforementioned problem, we propose InferFair, a QoS-aware scheduling framework for ensuring performance isolation in heterogeneous model serving systems. InferFair focuses on two key designs: (1) periodically estimating effective throughput requirements of all active models online and (2) applying fine-grained adjustments to minimize the impact differences of GPU sharing on heterogeneous model services. We conduct intensive experiments on a variety of DNN models to demonstrate the effectiveness of InferFair. Compared to a prior competitor named Clockwork, InferFair alleviates the performance isolation problem by up to 1.7x, as well as improving the overall goodput by up to 25.6%.& COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页码:10 / 20
页数:11
相关论文
共 50 条
  • [21] A CROSS-LAYER SCHEDULING ALGORITHM WITH QoS-AWARE FOR COGNITIVE RADIO SYSTEMS
    Zhu, Lei
    Chen, Jianbin
    Zhao, Ying
    PROCEEDINGS OF THE 2010 INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENCE AND AWARENESS INTERNET, AIAI2010, 2010, : 259 - 264
  • [22] Performance Analysis of a Clustering Model for QoS-Aware Service Recommendation
    Ding, Fei
    Wen, Tao
    Ren, Suju
    Bao, Jianmin
    ELECTRONICS, 2020, 9 (05)
  • [23] Adaptive Qos-Aware Wireless Packet Scheduling in OFDMA Broadband Wireless Systems
    Myung, Kwangsik
    Ryu, Seungwan
    Ryu, Yunghan
    Hong, Seungkxvon
    Yoo, Myungsik
    INFORMATION NETWORKING: TOWARDS UBIQUITOUS NETWORKING AND SERVICES, 2008, 5200 : 60 - +
  • [24] qSDS: A QoS-Aware I/O Scheduling Framework towards Software Defined Storage
    Wang, Jianzong
    Cheng, Lianglun
    ELEVENTH 2015 ACM/IEEE SYMPOSIUM ON ARCHITECTURES FOR NETWORKING AND COMMUNICATIONS SYSTEMS, 2015, : 195 - 196
  • [25] QoS-Aware Fault-Tolerant Scheduling for Real-Time Tasks on Heterogeneous Clusters
    Zhu, Xiaomin
    Qin, Xiao
    Qiu, Meikang
    IEEE TRANSACTIONS ON COMPUTERS, 2011, 60 (06) : 800 - 812
  • [26] QoS-Aware Cross-Layer Scheduling for Cognitive Radio Networks with Heterogeneous Data Traffic
    Chye, Yin Hui
    Dutkiewicz, Eryk
    Vesilo, Rein
    Liu, Ren Ping
    2013 AUSTRALASIAN TELECOMMUNICATION NETWORKS AND APPLICATIONS CONFERENCE (ATNAC), 2013, : 213 - 218
  • [27] Towards a Continuous Model-Based Engineering Process for QoS-Aware Self-adaptive Systems
    D'Angelo, Mirko
    Pagliari, Lorenzo
    Caporuscio, Mauro
    Mirandola, Raffaela
    Trubiani, Catia
    SOFTWARE ENGINEERING AND FORMAL METHODS, SEFM 2019, 2020, 12226 : 69 - 76
  • [28] A dynamic queue length scheduling algorithm for QoS guarantee in heterogeneous traffic indoor systems
    Zorba, Nizar
    Verikoukis, Christos
    Perez-Neira, Ana I.
    Foglar, Andreas
    2008 3RD INTERNATIONAL SYMPOSIUM ON WIRELESS PERVASIVE COMPUTING, VOLS 1-2, 2008, : 246 - +
  • [29] A Multi-Dimensional Scheduling Scheme for QoS-Aware Real-Time Applications on Heterogeneous Clusters
    Zhu, Xiaomin
    Lu, Peizhong
    HPCC 2008: 10TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2008, : 205 - 212
  • [30] Towards Self-Adaptive Machine Learning-Enabled Systems Through QoS-Aware Model Switching
    Kulkarni, Shubham
    Marda, Arya
    Vaidhyanathan, Karthik
    2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 1721 - 1725