A deep learning container cloud for GPU resources

被引:0
|
作者
机构
[1] [1,Xiao, Yi
[2] Gao, Pengdong
[3] Qi, Quan
[4] Lu, Yongquan
来源
Gao, Pengdong (pdgao@cuc.edu.cn) | 1600年 / Universidad Central de Venezuela卷 / 55期
关键词
Graphics processing unit - Topology - Deep neural networks - Distributed computer systems - Computer aided instruction;
D O I
暂无
中图分类号
学科分类号
摘要
With the development of deep learning, deep learning framework has become an important tool for the deep neural network development. The framework greatly shortens the network construction and computing time, and its powerful computing ability comes from GPU. But It is an important issue that how to effectively allocate and use GPU resources in heterogeneous cluster among many frameworks. In this paper, we propose a Deep Learning Container Cloud (DLC) architecture for GPU resources specifically. With the characteristics of easy deployment and easy migration, the frameworks can be deployed on heterogeneous cluster in the form of container, and the GPU driver and container can be decoupled according to NVIDIA-docker volume. The DLC provides services in the form of the MESOS framework. After obtaining resources through scheduler, a deep learning framework is created quickly to meet the requirements. DLC will loads the specified GPU resource and the corresponding runtime library to achieve the rapid creation of a deep learning environment with specific version. In addition, this paper proposes an allocation algorithm based on GPU topology. DLC constructs the topo-tree by analyzing the GPU topology structure in agent node, and on this basis, assigns the GPU with the P2P function within the node. Our experiment shows that the use of P2P data transmission in containers will significantly increase bandwidth. It is of great significance for promoting the development of deep learning.
引用
收藏
相关论文
共 50 条
  • [31] Estimating GPU Memory Consumption of Deep Learning Models
    Gao, Yanjie
    Liu, Yu
    Zhang, Hongyu
    Li, Zhengxian
    Zhu, Yonghao
    Lin, Haoxiang
    Yang, Mao
    PROCEEDINGS OF THE 28TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '20), 2020, : 1342 - 1352
  • [32] Involving CPUs into Multi-GPU Deep Learning
    Le, Tung D.
    Sekiyama, Taro
    Negishi, Yasushi
    Imai, Haruki
    Kawachiya, Kiyokuni
    PROCEEDINGS OF THE 2018 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '18), 2018, : 56 - 67
  • [33] Deep Learning Workload Scheduling in GPU Datacenters: A Survey
    Ye, Zhisheng
    Gao, Wei
    Hu, Qinghao
    Sun, Peng
    Wang, Xiaolin
    Luo, Yingwei
    Zhang, Tianwei
    Wen, Yonggang
    ACM COMPUTING SURVEYS, 2024, 56 (06)
  • [34] AntMan: Dynamic Scaling on GPU Clusters for Deep Learning
    Xiao, Wencong
    Ren, Shiru
    Li, Yong
    Zhang, Yang
    Hou, Pengyang
    Li, Zhi
    Feng, Yihui
    Lin, Wei
    Jia, Yangqing
    PROCEEDINGS OF THE 14TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '20), 2020, : 533 - 548
  • [35] Understanding of GPU Architectural Vulnerability for Deep Learning Workloads
    Santoso, Danny
    Jeon, Hyeran
    2019 IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI AND NANOTECHNOLOGY SYSTEMS (DFT), 2019,
  • [36] Scaling Deep Learning on GPU and Knights Landing clusters
    You, Yang
    Buluc, Aydin
    Demmel, James
    SC'17: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2017,
  • [37] Performance Evaluation of Deep Learning Frameworks on Embedded GPU
    Fang, Hao
    Lan, Qiang
    Shi, Yang
    Wen, Mei
    2016 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SECURITY (CSIS 2016), 2016, : 200 - 205
  • [38] Tiresias: A GPU Cluster Manager for Distributed Deep Learning
    Gu, Juncheng
    Chowdhury, Mosharaf
    Shin, Kang G.
    Zhu, Yibo
    Jeon, Myeongjae
    Qian, Junjie
    Liu, Hongqiang
    Guo, Chuanxiong
    PROCEEDINGS OF THE 16TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, 2019, : 485 - 500
  • [39] Identification of Asphyxia in Newborns using GPU for Deep Learning
    Moharir, Minal
    Sachin, M. U.
    Nagaraj, Rishab
    Samiksha, M.
    Rao, Sanil
    2017 2ND INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2017, : 236 - 239
  • [40] Optimizing Deep Learning Workloads on ARM GPU with TVM
    Zheng, Lianmin
    Chen, Tianqi
    1ST ACM REQUEST WORKSHOP/TOURNAMENT ON REPRODUCIBLE SOFTWARE/HARDWARE CO-DESIGN OF PARETO-EFFICIENT DEEP LEARNING, 2018,