A deep learning container cloud for GPU resources

被引:0
|
作者
机构
[1] [1,Xiao, Yi
[2] Gao, Pengdong
[3] Qi, Quan
[4] Lu, Yongquan
来源
Gao, Pengdong (pdgao@cuc.edu.cn) | 1600年 / Universidad Central de Venezuela卷 / 55期
关键词
Graphics processing unit - Topology - Deep neural networks - Distributed computer systems - Computer aided instruction;
D O I
暂无
中图分类号
学科分类号
摘要
With the development of deep learning, deep learning framework has become an important tool for the deep neural network development. The framework greatly shortens the network construction and computing time, and its powerful computing ability comes from GPU. But It is an important issue that how to effectively allocate and use GPU resources in heterogeneous cluster among many frameworks. In this paper, we propose a Deep Learning Container Cloud (DLC) architecture for GPU resources specifically. With the characteristics of easy deployment and easy migration, the frameworks can be deployed on heterogeneous cluster in the form of container, and the GPU driver and container can be decoupled according to NVIDIA-docker volume. The DLC provides services in the form of the MESOS framework. After obtaining resources through scheduler, a deep learning framework is created quickly to meet the requirements. DLC will loads the specified GPU resource and the corresponding runtime library to achieve the rapid creation of a deep learning environment with specific version. In addition, this paper proposes an allocation algorithm based on GPU topology. DLC constructs the topo-tree by analyzing the GPU topology structure in agent node, and on this basis, assigns the GPU with the P2P function within the node. Our experiment shows that the use of P2P data transmission in containers will significantly increase bandwidth. It is of great significance for promoting the development of deep learning.
引用
收藏
相关论文
共 50 条
  • [1] Vertical Autoscaling of GPU Resources for Machine Learning in the Cloud
    Jang, Hyeon-Jun
    Yim, Yin-Goo
    Jin, Hyun-Wook
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5710 - 5712
  • [2] Transparent GPU Sharing in Container Clouds for Deep LearningWorkloads
    Wu, Bingyang
    Zhang, Zili
    Bai, Zhihao
    Liu, Xuanzhe
    Jin, Xin
    PROCEEDINGS OF THE 20TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, NSDI 2023, 2023, : 69 - 85
  • [3] An optimal defensive deception framework for the container-based cloud with deep reinforcement learning
    Li, Huanruo
    Guo, Yunfei
    Sun, Penghao
    Wang, Yawen
    Huo, Shumin
    IET INFORMATION SECURITY, 2022, 16 (03) : 178 - 192
  • [4] Container Allocation in Cloud Environment Using Multi-Agent Deep Reinforcement Learning
    Danino, Tom
    Ben-Shimol, Yehuda
    Greenberg, Shlomo
    ELECTRONICS, 2023, 12 (12)
  • [5] A fine-grained GPU sharing and job scheduling for deep learning jobs on the cloud
    Chung, Wu-Chun
    Tong, Jyun-Sen
    Chen, Zhi-Hao
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (02):
  • [6] Multi-Tier GPU Virtualization for Deep Learning in Cloud-Edge Systems
    Kennedy, Jason
    Sharma, Vishal
    Varghese, Blesson
    Reano, Carlos
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (07) : 2107 - 2123
  • [7] An Optimal Active Defensive Security Framework for the Container-Based Cloud with Deep Reinforcement Learning
    Li, Yuanbo
    Hu, Hongchao
    Liu, Wenyan
    Yang, Xiaohan
    ELECTRONICS, 2023, 12 (07)
  • [8] DSTS: A hybrid optimal and deep learning for dynamic scalable task scheduling on container cloud environment
    Saravanan Muniswamy
    Radhakrishnan Vignesh
    Journal of Cloud Computing, 11
  • [9] DSTS: A hybrid optimal and deep learning for dynamic scalable task scheduling on container cloud environment
    Muniswamy, Saravanan
    Vignesh, Radhakrishnan
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2022, 11 (01):
  • [10] Efficient Container Scheduling With Hybrid Deep Learning Model for Improved Service Reliability in Cloud Computing
    Jeon, Jueun
    Park, Sihyun
    Jeong, Byeonghui
    Jeong, Young-Sik
    IEEE ACCESS, 2024, 12 : 65166 - 65177