Design of an adaptive GPU sharing and scheduling scheme in container-based cluster

被引:0
|
作者
Qichen Chen
Jisun Oh
Seoyoung Kim
Yoonhee Kim
机构
[1] Seoul National University,Department of Computer Science and Engineering
[2] Sookmyung Women’s University,Department of Computer Science
来源
Cluster Computing | 2020年 / 23卷
关键词
GPU resource sharing; GPU management; GPU scheduling; GPU virtualization;
D O I
暂无
中图分类号
学科分类号
摘要
Container based virtualization is an innovative technology that accelerates software development by providing portability and maintainability of applications. Recently, a growing number of workloads such as high performance computing (HPC) and Deep Learning(DL) are deployed in the container based environment. However, GPU resource management issues especially the GPU memory over subscription issue in container-based clusters, which brings substantial performance loss, is still challenging. This paper proposes an adaptive fair-share method to share effectively in container-based virtualization environment as well as an execution rescheduling method to manage the execution order of each container for acquiring maximum performance gain. We also proposed a checkpoint based mechanism especially for DL workload running with TensorFlow, which can efficiently solve the GPU memory over subscription problem. We demonstrate that our approach contributes to overall performance improvement as well as higher resource utilization compared to default and static fair-share methods with homogeneous and heterogeneous workloads. Compared to two other conditions, their results show that the proposed method reduces by 16.37%, 15.61% in average execution time and boosts approximately by 52.46%, 10.3% in average GPU memory utilization, respectively. We also evaluated our checkpoint based mechanism by running multiple CNN workloads with TensorFlow at the same time and the result shows our proposed mechanism can ensure each workload executing safely without out of memory (OOM) error occurs.
引用
收藏
页码:2179 / 2191
页数:12
相关论文
共 50 条
  • [41] New YARN sharing GPU based on graphics memory granularity scheduling
    Shi, Jinliang
    Chen, Dewu
    Liang, Jiabi
    Li, Lin
    Lin, Yue
    Li, Jianjiang
    [J]. PARALLEL COMPUTING, 2023, 117
  • [42] Design and performance analysis of a message scheduling scheme for WLAN-based cluster computing
    Lee, Junghoon
    Kang, Mikyung
    Kang, Euiyoung
    Park, Gyungleen
    Kim, Hanil
    Kim, Cheolmin
    Kim, Seongbaeg
    Hong, Jiman
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2006, PT 4, 2006, 3983 : 558 - 566
  • [43] Container-based data-intensive application scheduling in hybrid cloud-edge collaborative environment
    Tang, Bing
    Luo, Jincheng
    Zhang, Jiaming
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 2024, 54 (07): : 1217 - 1240
  • [44] A Container-based Design Methodology for Robotic Applications on Kubernetes Edge-Cloud architectures
    Lumpp, Francesco
    Panato, Marco
    Fummi, Franco
    Bombieri, Nicola
    [J]. PROCEEDINGS OF THE 2021 FORUM ON SPECIFICATION & DESIGN LANGUAGES (FDL), 2021,
  • [45] Design and implementation of a container-based virtual client architecture for interactive digital signage systems
    Park, Youngki
    Yang, Hyunsik
    Thanh Dinh
    Kim, Younghan
    [J]. INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2017, 13 (07):
  • [46] Ai BCS: A GPU cluster scheduling optimization based on SKE model
    Liu, Bocheng
    Chen, Qingkui
    Li, Jinjing
    Gao, Liping
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 2016, 47 : 121 - 132
  • [47] Design and Applications of Agile Factory AaaS Architecture Based on Container-based Virtualized Automation Control Unit
    Lee, Jaehyeong
    Um, Changyong
    Shin, Jinjae
    Jeong, Jongpil
    [J]. 10TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2019) / THE 2ND INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40 2019) / AFFILIATED WORKSHOPS, 2019, 151 : 622 - 629
  • [48] Energy-Efficient Workflow Scheduling Using Container-Based Virtualization in Software-Defined Data Centers
    Ranjan, Rohit
    Thakur, Ishan Singh
    Aujla, Gagangeet Singh
    Kumar, Neeraj
    Zomaya, Albert Y.
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (12) : 7646 - 7657
  • [49] Optimization Algorithm for Computing Power Resource Scheduling Based on Container Cluster Deployment
    Fang, Wei
    Wu, Jie
    Luo, Xiaoguang
    [J]. Journal of Network Intelligence, 2024, 9 (02): : 835 - 849
  • [50] Container-based task scheduling for edge computing in IoT-cloud environment using improved HBF optimisation algorithm
    Sobhanayak, Srichandan
    Jaiswal, Kavita
    Turuk, Ashok Kumar
    Sahoo, Bibhudatta
    Mohanta, Bhabendu Kumar
    Jena, Debasish
    [J]. INTERNATIONAL JOURNAL OF EMBEDDED SYSTEMS, 2020, 13 (01) : 85 - 100