Empirical Performance Evaluation of Communication Libraries for Multi-GPU based Distributed Deep Learning in a Container Environment

被引：1

作者：

Choi, HyeonSeong ^{[1
]}

Kim, Youngrang ^{[2
]}

Lee, Jaehwan ^{[3
]}

Kim, Yoonhee ^{[4
]}

机构：

[1] Korea Aerosp Univ, KAU, Elect & Informat Engn, Goyang City, Gyeonggi Do, South Korea

[2] Korea Aerosp Univ, Goyang City, Gyeonggi Do, South Korea

[3] Korea Aerosp Univ, Dept Elect & Informat Engn, Goyang City, Gyeonggi Do, South Korea

[4] Sookmyung Womens Univ, Comp Sci Dept, Seoul, South Korea

来源：

KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS | 2021年 / 15卷 / 03期

基金：

新加坡国家研究基金会;

关键词：

Docker; Collective Communication; Distributed Deep Leaning; Multi-GPU; MPI;

D O I：

10.3837/tiis.2021.03.006

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, most cloud services use Docker container environment to provide their services. However, there are no researches to evaluate the performance of communication libraries for multi-GPU based distributed deep learning in a Docker container environment. In this paper, we propose an efficient communication architecture for multi-GPU based deep learning in a Docker container environment by evaluating the performances of various communication libraries. We compare the performances of the parameter server architecture and the All reduce architecture, which are typical distributed deep learning architectures. Further, we analyze the performances of two separate multi-GPU resource allocation policies - allocating a single GPU to each Docker container and allocating multiple GPUs to each Docker container. We also experiment with the scalability of collective communication by increasing the number of GPUs from one to four. Through experiments, we compare OpenMPI and MPICH, which are representative open source MPI libraries, and NCCL, which is NVIDIA's collective communication library for the multi-GPU setting. In the parameter server architecture, we show that using CUDA-aware OpenMPI with multi-GPU per Docker container environment reduces communication latency by up to 75%. Also, we show that using NCCL in All-reduce architecture reduces communication latency by up to 93% compared to other libraries.

引用

页码：911 / 931

页数：21

共 50 条

[41] Performance Prediction of GPU-based Deep Learning Applications
Gianniti, Eugenio
Zhang, Li
Ardagna, Danilo
2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 167 - 170
[42] Performance Prediction of GPU-based Deep Learning Applications
Gianniti, Eugenio
Zhang, Li
Ardagna, Danilo
CLOSER: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2019, : 279 - 286
[43] Multi-GPU Implementation and Performance Optimization for CSR-Based Sparse Matrix-Vector Multiplication
Guo, Ping
Zhang, Changjiang
PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2419 - 2423
[44] A multi-GPU based high-performance computing framework in elastodynamics simulation using octree meshes
Mohammadian, Shayan
Kumar, Ankit S.
Song, Chongmin
COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2025, 436
[45] A Topology-Aware Performance Prediction Model for Distributed Deep Learning on GPU Clusters
Lin, Zheyu
Chen, Xukun
Zhao, Hanyu
Luan, Yunteng
Yang, Zhi
Dai, Yafei
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2795 - 2801
[46] Performance Comparision of TPU, GPU, CPU on Google Colaboratory over Distributed Deep Learning
Kimm, Haklin
Paik, Incheon
Kimm, Hanke
2021 IEEE 14TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC 2021), 2021, : 312 - 319
[47] Detailed Performance Analysis of Distributed Tensorflow on a GPU Cluster using Deep Learning Algorithms
Malik, Abid
Lu, Micheal
Wang, Nathenial
Lin, Yeiwei
Yoo, Shinjae
2018 NEW YORK SCIENTIFIC DATA SUMMIT (NYSDS), 2018,
[48] Container Allocation in Cloud Environment Using Multi-Agent Deep Reinforcement Learning
Danino, Tom
Ben-Shimol, Yehuda
Greenberg, Shlomo
ELECTRONICS, 2023, 12 (12)
[49] XML-based agent communication in a Distributed Learning Environment
Leung, EWC
Li, Q
ADVANCES IN WEB-BASED LEARNING - ICWL 2004, 2004, 3143 : 136 - 146
[50] The High Performance Computing for 3D Dynamic Holographic Simulation Based on Multi-GPU Cluster
Zhang Yingxi
Lin Tingyu
Guo Liqin
THEORY, METHODOLOGY, TOOLS AND APPLICATIONS FOR MODELING AND SIMULATION OF COMPLEX SYSTEMS, PT I, 2016, 643 : 431 - 441

← 1 2 3 4 5 →