A deep learning container cloud for GPU resources

被引：0

作者：

机构：

[1] [1,Xiao, Yi

[2] Gao, Pengdong

[3] Qi, Quan

[4] Lu, Yongquan

来源：

Gao, Pengdong (pdgao@cuc.edu.cn) | 1600年 / Universidad Central de Venezuela卷 / 55期

关键词：

Graphics processing unit - Topology - Deep neural networks - Distributed computer systems - Computer aided instruction;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

With the development of deep learning, deep learning framework has become an important tool for the deep neural network development. The framework greatly shortens the network construction and computing time, and its powerful computing ability comes from GPU. But It is an important issue that how to effectively allocate and use GPU resources in heterogeneous cluster among many frameworks. In this paper, we propose a Deep Learning Container Cloud (DLC) architecture for GPU resources specifically. With the characteristics of easy deployment and easy migration, the frameworks can be deployed on heterogeneous cluster in the form of container, and the GPU driver and container can be decoupled according to NVIDIA-docker volume. The DLC provides services in the form of the MESOS framework. After obtaining resources through scheduler, a deep learning framework is created quickly to meet the requirements. DLC will loads the specified GPU resource and the corresponding runtime library to achieve the rapid creation of a deep learning environment with specific version. In addition, this paper proposes an allocation algorithm based on GPU topology. DLC constructs the topo-tree by analyzing the GPU topology structure in agent node, and on this basis, assigns the GPU with the P2P function within the node. Our experiment shows that the use of P2P data transmission in containers will significantly increase bandwidth. It is of great significance for promoting the development of deep learning.

引用

共 50 条

[1] Vertical Autoscaling of GPU Resources for Machine Learning in the Cloud
Jang, Hyeon-Jun
Yim, Yin-Goo
Jin, Hyun-Wook
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5710 - 5712
[2] Transparent GPU Sharing in Container Clouds for Deep LearningWorkloads
Wu, Bingyang
Zhang, Zili
Bai, Zhihao
Liu, Xuanzhe
Jin, Xin
PROCEEDINGS OF THE 20TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, NSDI 2023, 2023, : 69 - 85
[3] An optimal defensive deception framework for the container-based cloud with deep reinforcement learning
Li, Huanruo
Guo, Yunfei
Sun, Penghao
Wang, Yawen
Huo, Shumin
IET INFORMATION SECURITY, 2022, 16 (03) : 178 - 192
[4] Container Allocation in Cloud Environment Using Multi-Agent Deep Reinforcement Learning
Danino, Tom
Ben-Shimol, Yehuda
Greenberg, Shlomo
ELECTRONICS, 2023, 12 (12)
[5] A fine-grained GPU sharing and job scheduling for deep learning jobs on the cloud
Chung, Wu-Chun
Tong, Jyun-Sen
Chen, Zhi-Hao
JOURNAL OF SUPERCOMPUTING, 2025, 81 (02):
[6] Multi-Tier GPU Virtualization for Deep Learning in Cloud-Edge Systems
Kennedy, Jason
Sharma, Vishal
Varghese, Blesson
Reano, Carlos
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (07) : 2107 - 2123
[7] An Optimal Active Defensive Security Framework for the Container-Based Cloud with Deep Reinforcement Learning
Li, Yuanbo
Hu, Hongchao
Liu, Wenyan
Yang, Xiaohan
ELECTRONICS, 2023, 12 (07)
[8] DSTS: A hybrid optimal and deep learning for dynamic scalable task scheduling on container cloud environment
Saravanan Muniswamy
Radhakrishnan Vignesh
Journal of Cloud Computing, 11
[9] DSTS: A hybrid optimal and deep learning for dynamic scalable task scheduling on container cloud environment
Muniswamy, Saravanan
Vignesh, Radhakrishnan
JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2022, 11 (01):
[10] Efficient Container Scheduling With Hybrid Deep Learning Model for Improved Service Reliability in Cloud Computing
Jeon, Jueun
Park, Sihyun
Jeong, Byeonghui
Jeong, Young-Sik
IEEE ACCESS, 2024, 12 : 65166 - 65177

← 1 2 3 4 5 →