Empirical Performance Evaluation of Communication Libraries for Multi-GPU based Distributed Deep Learning in a Container Environment

被引:1
|
作者
Choi, HyeonSeong [1 ]
Kim, Youngrang [2 ]
Lee, Jaehwan [3 ]
Kim, Yoonhee [4 ]
机构
[1] Korea Aerosp Univ, KAU, Elect & Informat Engn, Goyang City, Gyeonggi Do, South Korea
[2] Korea Aerosp Univ, Goyang City, Gyeonggi Do, South Korea
[3] Korea Aerosp Univ, Dept Elect & Informat Engn, Goyang City, Gyeonggi Do, South Korea
[4] Sookmyung Womens Univ, Comp Sci Dept, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Docker; Collective Communication; Distributed Deep Leaning; Multi-GPU; MPI;
D O I
10.3837/tiis.2021.03.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, most cloud services use Docker container environment to provide their services. However, there are no researches to evaluate the performance of communication libraries for multi-GPU based distributed deep learning in a Docker container environment. In this paper, we propose an efficient communication architecture for multi-GPU based deep learning in a Docker container environment by evaluating the performances of various communication libraries. We compare the performances of the parameter server architecture and the All reduce architecture, which are typical distributed deep learning architectures. Further, we analyze the performances of two separate multi-GPU resource allocation policies - allocating a single GPU to each Docker container and allocating multiple GPUs to each Docker container. We also experiment with the scalability of collective communication by increasing the number of GPUs from one to four. Through experiments, we compare OpenMPI and MPICH, which are representative open source MPI libraries, and NCCL, which is NVIDIA's collective communication library for the multi-GPU setting. In the parameter server architecture, we show that using CUDA-aware OpenMPI with multi-GPU per Docker container environment reduces communication latency by up to 75%. Also, we show that using NCCL in All-reduce architecture reduces communication latency by up to 93% compared to other libraries.
引用
收藏
页码:911 / 931
页数:21
相关论文
共 50 条
  • [41] Performance Prediction of GPU-based Deep Learning Applications
    Gianniti, Eugenio
    Zhang, Li
    Ardagna, Danilo
    2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 167 - 170
  • [42] Performance Prediction of GPU-based Deep Learning Applications
    Gianniti, Eugenio
    Zhang, Li
    Ardagna, Danilo
    CLOSER: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2019, : 279 - 286
  • [43] Multi-GPU Implementation and Performance Optimization for CSR-Based Sparse Matrix-Vector Multiplication
    Guo, Ping
    Zhang, Changjiang
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2419 - 2423
  • [44] A multi-GPU based high-performance computing framework in elastodynamics simulation using octree meshes
    Mohammadian, Shayan
    Kumar, Ankit S.
    Song, Chongmin
    COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2025, 436
  • [45] A Topology-Aware Performance Prediction Model for Distributed Deep Learning on GPU Clusters
    Lin, Zheyu
    Chen, Xukun
    Zhao, Hanyu
    Luan, Yunteng
    Yang, Zhi
    Dai, Yafei
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2795 - 2801
  • [46] Performance Comparision of TPU, GPU, CPU on Google Colaboratory over Distributed Deep Learning
    Kimm, Haklin
    Paik, Incheon
    Kimm, Hanke
    2021 IEEE 14TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC 2021), 2021, : 312 - 319
  • [47] Detailed Performance Analysis of Distributed Tensorflow on a GPU Cluster using Deep Learning Algorithms
    Malik, Abid
    Lu, Micheal
    Wang, Nathenial
    Lin, Yeiwei
    Yoo, Shinjae
    2018 NEW YORK SCIENTIFIC DATA SUMMIT (NYSDS), 2018,
  • [48] Container Allocation in Cloud Environment Using Multi-Agent Deep Reinforcement Learning
    Danino, Tom
    Ben-Shimol, Yehuda
    Greenberg, Shlomo
    ELECTRONICS, 2023, 12 (12)
  • [49] XML-based agent communication in a Distributed Learning Environment
    Leung, EWC
    Li, Q
    ADVANCES IN WEB-BASED LEARNING - ICWL 2004, 2004, 3143 : 136 - 146
  • [50] The High Performance Computing for 3D Dynamic Holographic Simulation Based on Multi-GPU Cluster
    Zhang Yingxi
    Lin Tingyu
    Guo Liqin
    THEORY, METHODOLOGY, TOOLS AND APPLICATIONS FOR MODELING AND SIMULATION OF COMPLEX SYSTEMS, PT I, 2016, 643 : 431 - 441