Empirical Performance Evaluation of Communication Libraries for Multi-GPU based Distributed Deep Learning in a Container Environment

被引:1
|
作者
Choi, HyeonSeong [1 ]
Kim, Youngrang [2 ]
Lee, Jaehwan [3 ]
Kim, Yoonhee [4 ]
机构
[1] Korea Aerosp Univ, KAU, Elect & Informat Engn, Goyang City, Gyeonggi Do, South Korea
[2] Korea Aerosp Univ, Goyang City, Gyeonggi Do, South Korea
[3] Korea Aerosp Univ, Dept Elect & Informat Engn, Goyang City, Gyeonggi Do, South Korea
[4] Sookmyung Womens Univ, Comp Sci Dept, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Docker; Collective Communication; Distributed Deep Leaning; Multi-GPU; MPI;
D O I
10.3837/tiis.2021.03.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, most cloud services use Docker container environment to provide their services. However, there are no researches to evaluate the performance of communication libraries for multi-GPU based distributed deep learning in a Docker container environment. In this paper, we propose an efficient communication architecture for multi-GPU based deep learning in a Docker container environment by evaluating the performances of various communication libraries. We compare the performances of the parameter server architecture and the All reduce architecture, which are typical distributed deep learning architectures. Further, we analyze the performances of two separate multi-GPU resource allocation policies - allocating a single GPU to each Docker container and allocating multiple GPUs to each Docker container. We also experiment with the scalability of collective communication by increasing the number of GPUs from one to four. Through experiments, we compare OpenMPI and MPICH, which are representative open source MPI libraries, and NCCL, which is NVIDIA's collective communication library for the multi-GPU setting. In the parameter server architecture, we show that using CUDA-aware OpenMPI with multi-GPU per Docker container environment reduces communication latency by up to 75%. Also, we show that using NCCL in All-reduce architecture reduces communication latency by up to 93% compared to other libraries.
引用
收藏
页码:911 / 931
页数:21
相关论文
共 50 条
  • [31] Multi-GPU Based Evaluation and Analysis of Prehistoric Ice Cores Using OpenCL
    Ditter, Alexander
    Schaffert, Roman
    Fey, Dietmar
    Schoen, Tobias
    Gruber, Roland
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGING SYSTEMS AND TECHNIQUES (IST) PROCEEDINGS, 2015, : 337 - 342
  • [32] 2.5D DEEP LEARNING FOR CT IMAGE RECONSTRUCTION USING A MULTI-GPU IMPLEMENTATION
    Ziabari, Amirkoushyar
    Ye, Dong Hye
    Srivastava, Somesh
    Sauer, Ken D.
    Thibault, Jean-Baptiste
    Bouman, Charles A.
    2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2018, : 2044 - 2049
  • [33] Performance Evaluation of Deep Learning Frameworks on Embedded GPU
    Fang, Hao
    Lan, Qiang
    Shi, Yang
    Wen, Mei
    2016 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SECURITY (CSIS 2016), 2016, : 200 - 205
  • [34] High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications
    Abdelfattah, Ahmad
    Ltaief, Hatem
    Keyes, David
    EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 601 - 612
  • [35] Vapor: A GPU Sharing Scheduler with Communication and Computation Pipeline for Distributed Deep Learning
    Zhu, Xiaorui
    Gong, Lei
    Zhu, Zongwei
    Zhou, Xuehai
    19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 108 - 116
  • [36] The Impact of GPU DVFS on the Energy and Performance of Deep Learning: an Empirical Study
    Tang, Zhenheng
    Wang, Yuxin
    Wang, Qiang
    Chu, Xiaowen
    E-ENERGY'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON FUTURE ENERGY SYSTEMS, 2019, : 315 - 325
  • [37] A Generic Performance Model for Deep Learning in a Distributed Environment
    Kavarakuntla, Tulasi
    Han, Liangxiu
    Lloyd, Huw
    Latham, Annabel
    Kleerekoper, Anthony
    Akintoye, Samson B.
    IEEE ACCESS, 2024, 12 : 8207 - 8219
  • [38] Comparative Performance Evaluation of Multi-GPU MLFMA Implementation for 2-D VIE Problems
    Pearson, Carl
    Hidayetoglu, Mert
    Ren, Wei
    Chew, Weng Cho
    Hwu, Wen-mei
    2017 COMPUTING AND ELECTROMAGNETICS INTERNATIONAL WORKSHOP (CEM'17), 2017, : 63 - 64
  • [39] cuTS: Scaling Subgraph Isomorphism on Distributed Multi-GPU Systems Using Trie Based Data Structure
    Xiang, Lizhi
    Khan, Arif
    Serra, Edoardo
    Halappanavar, Mahantesh
    Sukumaran-Rajam, Aravind
    SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
  • [40] Communication-Efficient Distributed Deep Learning with GPU-FPGA Heterogeneous Computing
    Tanaka, Kenji
    Arikawa, Yuki
    Ito, Tsuyoshi
    Morita, Kazutaka
    Nemoto, Naru
    Miura, Fumiaki
    Terada, Kazuhiko
    Teramoto, Junji
    Sakamoto, Takeshi
    2020 IEEE SYMPOSIUM ON HIGH-PERFORMANCE INTERCONNECTS (HOTI 2020), 2020, : 43 - 46