A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning

被引：0

作者：

Shi, Shaohuai ^{[1
]}

Wang, Qiang ^{[1
]}

Chu, Xiaowen ^{[1
]}

Li, Bo ^{[2
]}

机构：

[1] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Peoples R China

[2] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

来源：

2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018) | 2018年

关键词：

Deep Learning; Graphics Processing Units; Stochastic Gradient Descent; NVLink; InfiniBand; Directed Acyclic Graph;

D O I：

10.1109/ICPADS.2018.00063

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

With huge amounts of training data, deep learning has made great breakthroughs in many artificial intelligence (AI) applications. However, such large-scale data sets present computational challenges, requiring training to be distributed on a cluster equipped with accelerators like GPUs. With the fast increase of GPU computing power, the data communications among GPUs have become a potential bottleneck on the overall training performance. In this paper, we first propose a general directed acyclic graph (DAG) model to describe the distributed synchronous stochastic gradient descent (S-SGD) algorithm, which has been widely used in distributed deep learning frameworks. To understand the practical impact of data communications on training performance, we conduct extensive empirical studies on four state-of-the-art distributed deep learning frameworks (i.e., Caffe-MPI, CNTK, MXNet and TensorFlow) over multi-GPU and multi-node environments with different data communication techniques, including PCIe, NVLink, 10GbE, and InfiniBand. Through both analytical and experimental studies, we identify the potential bottlenecks and overheads that could be further optimized. At last, we make the data set of our experimental traces publicly available, which could be used to support simulation based studies.

引用

页码：425 / 432

页数：8

共 50 条

[1] Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models
Teng, Yunfei
Gao, Wenbo
Chalus, Francois
Choromanska, Anna
Goldfarb, Donald
Weller, Adrian
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[2] An efficient, distributed stochastic gradient descent algorithm for deep-learning applications
Cong, Guojing
Bhardwaj, Onkar
Feng, Minwei
[J]. 2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, : 11 - 20
[3] Recent Advances in Stochastic Gradient Descent in Deep Learning
Tian, Yingjie
Zhang, Yuqi
Zhang, Haibin
[J]. MATHEMATICS, 2023, 11 (03)
[4] Stochastic Gradient Push for Distributed Deep Learning
Assran, Mahmoud
Loizou, Nicolas
Ballas, Nicolas
Rabbat, Michael
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[5] Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning
Guo, Pengzhan
Ye, Zeyang
Xiao, Keli
Zhu, Wei
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (10) : 5037 - 5050
[6] Bayesian Distributed Stochastic Gradient Descent
Teng, Michael
Wood, Frank
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[7] A Hierarchical, bulk-synchronous stochastic gradient descent algorithm for deep-learning applications on GPU clusters
Cong, Guojing
Bhardwaj, Onkar
[J]. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 818 - 821
[8] Elastic Consistency: A Practical Consistency Model for Distributed Stochastic Gradient Descent
Nadiradze, Giorgi
Markov, Ilia
Chatterjee, Bapi
Kungurtsev, Vyacheslav
Alistarh, Dan
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9037 - 9045
[9] A Novel Stochastic Gradient Descent Algorithm Based on Grouping over Heterogeneous Cluster Systems for Distributed Deep Learning
Jiang, Wenbin
Ye, Geyan
Yang, Laurence T.
Zhu, Jian
Ma, Yang
Xie, Xia
Jin, Hai
[J]. 2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 391 - 398
[10] Predicting Throughput of Distributed Stochastic Gradient Descent
Li, Zhuojin
Paolieri, Marco
Golubchik, Leana
Lin, Sung-Han
Yan, Wumo
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 2900 - 2912

← 1 2 3 4 5 →