Empirical Performance Evaluation of Communication Libraries for Multi-GPU based Distributed Deep Learning in a Container Environment

被引:1
|
作者
Choi, HyeonSeong [1 ]
Kim, Youngrang [2 ]
Lee, Jaehwan [3 ]
Kim, Yoonhee [4 ]
机构
[1] Korea Aerosp Univ, KAU, Elect & Informat Engn, Goyang City, Gyeonggi Do, South Korea
[2] Korea Aerosp Univ, Goyang City, Gyeonggi Do, South Korea
[3] Korea Aerosp Univ, Dept Elect & Informat Engn, Goyang City, Gyeonggi Do, South Korea
[4] Sookmyung Womens Univ, Comp Sci Dept, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Docker; Collective Communication; Distributed Deep Leaning; Multi-GPU; MPI;
D O I
10.3837/tiis.2021.03.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, most cloud services use Docker container environment to provide their services. However, there are no researches to evaluate the performance of communication libraries for multi-GPU based distributed deep learning in a Docker container environment. In this paper, we propose an efficient communication architecture for multi-GPU based deep learning in a Docker container environment by evaluating the performances of various communication libraries. We compare the performances of the parameter server architecture and the All reduce architecture, which are typical distributed deep learning architectures. Further, we analyze the performances of two separate multi-GPU resource allocation policies - allocating a single GPU to each Docker container and allocating multiple GPUs to each Docker container. We also experiment with the scalability of collective communication by increasing the number of GPUs from one to four. Through experiments, we compare OpenMPI and MPICH, which are representative open source MPI libraries, and NCCL, which is NVIDIA's collective communication library for the multi-GPU setting. In the parameter server architecture, we show that using CUDA-aware OpenMPI with multi-GPU per Docker container environment reduces communication latency by up to 75%. Also, we show that using NCCL in All-reduce architecture reduces communication latency by up to 93% compared to other libraries.
引用
收藏
页码:911 / 931
页数:21
相关论文
共 50 条
  • [1] Performance Analysis of Distributed Deep Learning Frameworks in a Multi-GPU Environment
    Kavarakuntla, Tulasi
    Han, Liangxiu
    Lloyd, Huw
    Latham, Annabel
    Akintoye, Samson B.
    20TH INT CONF ON UBIQUITOUS COMP AND COMMUNICAT (IUCC) / 20TH INT CONF ON COMP AND INFORMATION TECHNOLOGY (CIT) / 4TH INT CONF ON DATA SCIENCE AND COMPUTATIONAL INTELLIGENCE (DSCI) / 11TH INT CONF ON SMART COMPUTING, NETWORKING, AND SERV (SMARTCNS), 2021, : 406 - 413
  • [2] Distributed texture memory in a Multi-GPU environment
    Moerschell, Adam
    Owens, John D.
    COMPUTER GRAPHICS FORUM, 2008, 27 (01) : 130 - 151
  • [3] An Empirical Evaluation of Allgatherv on Multi-GPU Systems
    Rolinger, Thomas B.
    Simon, Tyler A.
    Krieger, Christopher D.
    2018 18TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2018, : 123 - 132
  • [4] Towards an optimized distributed deep learning framework for a heterogeneous multi-GPU cluster
    Youngrang Kim
    Hyeonseong Choi
    Jaehwan Lee
    Jik-Soo Kim
    Hyunseung Jei
    Hongchan Roh
    Cluster Computing, 2020, 23 : 2287 - 2300
  • [5] Towards an optimized distributed deep learning framework for a heterogeneous multi-GPU cluster
    Kim, Youngrang
    Choi, Hyeonseong
    Lee, Jaehwan
    Kim, Jik-Soo
    Jei, Hyunseung
    Roh, Hongchan
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (03): : 2287 - 2300
  • [6] Involving CPUs into Multi-GPU Deep Learning
    Le, Tung D.
    Sekiyama, Taro
    Negishi, Yasushi
    Imai, Haruki
    Kawachiya, Kiyokuni
    PROCEEDINGS OF THE 2018 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '18), 2018, : 56 - 67
  • [7] Multi-GPU Design and Performance Evaluation of Homomorphic Encryption on GPU Clusters
    Al Badawi, Ahmad
    Veeravalli, Bharadwaj
    Lin, Jie
    Xiao, Nan
    Kazuaki, Matsumura
    Khin Mi Mi, Aung
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (02) : 379 - 391
  • [8] Efficient Multi-GPU Memory Management for Deep Learning Acceleration
    Kim, Youngrang
    Lee, Jaehwan
    Kim, Jik-Soo
    Jei, Hyunseung
    Roh, Hongchan
    2018 IEEE 3RD INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W), 2018, : 37 - 43
  • [9] Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
    Pal, Saptadeep
    Ebrahimi, Eiman
    Zulfiqar, Arslan
    Fu, Yaosheng
    Zhang, Victor
    Migacz, Szymon
    Nellans, David
    Gupta, Puneet
    IEEE MICRO, 2019, 39 (05) : 91 - 101
  • [10] Performance and Energy Aware Training of a Deep Neural Network in a Multi-GPU Environment with Power Capping
    Koszczal, Grzegorz
    Dobrosolski, Jan
    Matuszek, Mariusz
    Czarnul, Pawel
    EURO-PAR 2023: PARALLEL PROCESSING WORKSHOPS, PT II, EURO-PAR 2023, 2024, 14352 : 5 - 16