Design of an adaptive GPU sharing and scheduling scheme in container-based cluster

被引:0
|
作者
Qichen Chen
Jisun Oh
Seoyoung Kim
Yoonhee Kim
机构
[1] Seoul National University,Department of Computer Science and Engineering
[2] Sookmyung Women’s University,Department of Computer Science
来源
Cluster Computing | 2020年 / 23卷
关键词
GPU resource sharing; GPU management; GPU scheduling; GPU virtualization;
D O I
暂无
中图分类号
学科分类号
摘要
Container based virtualization is an innovative technology that accelerates software development by providing portability and maintainability of applications. Recently, a growing number of workloads such as high performance computing (HPC) and Deep Learning(DL) are deployed in the container based environment. However, GPU resource management issues especially the GPU memory over subscription issue in container-based clusters, which brings substantial performance loss, is still challenging. This paper proposes an adaptive fair-share method to share effectively in container-based virtualization environment as well as an execution rescheduling method to manage the execution order of each container for acquiring maximum performance gain. We also proposed a checkpoint based mechanism especially for DL workload running with TensorFlow, which can efficiently solve the GPU memory over subscription problem. We demonstrate that our approach contributes to overall performance improvement as well as higher resource utilization compared to default and static fair-share methods with homogeneous and heterogeneous workloads. Compared to two other conditions, their results show that the proposed method reduces by 16.37%, 15.61% in average execution time and boosts approximately by 52.46%, 10.3% in average GPU memory utilization, respectively. We also evaluated our checkpoint based mechanism by running multiple CNN workloads with TensorFlow at the same time and the result shows our proposed mechanism can ensure each workload executing safely without out of memory (OOM) error occurs.
引用
收藏
页码:2179 / 2191
页数:12
相关论文
共 50 条
  • [1] Design of an adaptive GPU sharing and scheduling scheme in container-based cluster
    Chen, Qichen
    Oh, Jisun
    Kim, Seoyoung
    Kim, Yoonhee
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (03): : 2179 - 2191
  • [2] Toward An Adaptive Fair GPU Sharing Scheme in Container-based Clusters
    Oh, Jisun
    Kim, Seoyoung
    Kim, Yoonhee
    [J]. 2018 IEEE 3RD INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W), 2018, : 79 - 85
  • [3] AutoScale: Adaptive QoS-Aware Container-based Cloud Applications Scheduling Framework
    Sun, Yao
    Meng, Lun
    Song, Yunkui
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2019, 13 (06): : 2824 - 2837
  • [4] Container-Based Job Management for Fair Resource Sharing
    Hong, Jue
    Balaji, Pavan
    Wen, Gaojin
    Tu, Bibo
    Yan, Junming
    Xu, Chengzhong
    Feng, Shengzhong
    [J]. SUPERCOMPUTING (ISC 2013), 2013, 7905 : 290 - 301
  • [5] A Scheduling Scheme in a Container-Based Edge Computing Environment Using Deep Reinforcement Learning Approach
    Lu, Tingting
    Zeng, Fanping
    Shen, Jingfei
    Chen, Guozhu
    Shu, Wenjuan
    Zhang, Weikang
    [J]. 2021 17TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING (MSN 2021), 2021, : 56 - 65
  • [6] Container-based MQTT Broker Cluster for Edge Computing
    Thean, Zhong Ying
    Yap, Vooi Voon
    Teh, Peh Chiong
    [J]. 2019 4TH INTERNATIONAL CONFERENCE AND WORKSHOPS ON RECENT ADVANCES AND INNOVATIONS IN ENGINEERING (ICRAIE): THRIVING TECHNOLOGIES, 2019,
  • [7] Container-Based Privacy Preserving Scheme for Android Applications
    CUI Haoliang
    SHAO Shuai
    NIU Shaozhang
    ZHANG Wen
    YUAN Yang
    [J]. Chinese Journal of Electronics, 2020, 29 (04) : 731 - 737
  • [8] Container-Based Privacy Preserving Scheme for Android Applications
    Cui, Haoliang
    Shao, Shuai
    Niu, Shaozhang
    Zhang, Wen
    Yuan, Yang
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2020, 29 (04) : 731 - 737
  • [9] More Sharing, More Benefits? A Study of Library Sharing in Container-Based Infrastructures
    Ferreira, Jose Bravo
    Cello, Marco
    Iglesias, Jesus Omana
    [J]. EURO-PAR 2017: PARALLEL PROCESSING, 2017, 10417 : 358 - 371
  • [10] A container-based approach for sharing environmental models as web services
    Qiao, Xiaohui
    Li, Zhiyu
    Zhang, Fengyuan
    Ames, Daniel P.
    Chen, Min
    James Nelson, E.
    Khattar, Rohit
    [J]. INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2021, 14 (08) : 1067 - 1086