Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient Descent

被引：0

作者：

Shen, Shuheng ^{[1
,2
]}

Xu, Linli ^{[1
,2
]}

Liu, Jingchang ^{[1
,2
]}

Liang, Xianfeng ^{[1
,2
]}

Cheng, Yifei ^{[1
,3
]}

机构：

[1] Univ Sci & Technol China, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei, Anhui, Peoples R China

[2] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei, Anhui, Peoples R China

[3] Univ Sci & Technol China, Sch Data Sci, Hefei, Anhui, Peoples R China

来源：

PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2019年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the increase in the amount of data and the expansion of model scale, distributed parallel training becomes an important and successful technique to address the optimization challenges. Nevertheless, although distributed stochastic gradient descent (SGD) algorithms can achieve a linear iteration speedup, they are limited significantly in practice by the communication cost, making it difficult to achieve a linear time speedup. In this paper, we propose a computation and communication decoupled stochastic gradient descent (CoCoD-SGD) algorithm to run computation and communication in parallel to reduce the communication cost. We prove that CoCoD-SGD has a linear iteration speedup with respect to the total computation capability of the hardware resources. In addition, it has a lower communication complexity and better time speedup comparing with traditional distributed SGD algorithms. Experiments on deep neural network training demonstrate the significant improvements of CoCoD-SGD: when training ResNetl8 and VGG16 with 16 Geforce GTX 1080Ti GPUs, CoCoD-SGD is up to 2-3 x faster than traditional synchronous SGD.

引用

页码：4582 / 4589

页数：8

共 50 条

[1] Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models
Teng, Yunfei
Gao, Wenbo
Chalus, Francois
Choromanska, Anna
Goldfarb, Donald
Weller, Adrian
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[2] An asynchronous distributed training algorithm based on Gossip communication and Stochastic Gradient Descent
Tu, Jun
Zhou, Jia
Ren, Donglin
[J]. COMPUTER COMMUNICATIONS, 2022, 195 : 416 - 423
[3] Distributed Stochastic Gradient Descent With Compressed and Skipped Communication
Phuong, Tran Thi
Phong, Le Trieu
Fukushima, Kazuhide
[J]. IEEE ACCESS, 2023, 11 : 99836 - 99846
[4] Communication-Censored Distributed Stochastic Gradient Descent
Li, Weiyu
Wu, Zhaoxian
Chen, Tianyi
Li, Liping
Ling, Qing
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) : 6831 - 6843
[5] Distributed Stochastic Gradient Descent with Event-Triggered Communication
George, Jemin
Gurram, Prudhvi
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7169 - 7178
[6] A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning
Shi, Shaohuai
Wang, Qiang
Chu, Xiaowen
Li, Bo
[J]. 2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 425 - 432
[7] Bayesian Distributed Stochastic Gradient Descent
Teng, Michael
Wood, Frank
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[8] Accelerating deep neural network training with inconsistent stochastic gradient descent
Wang, Linnan
Yang, Yi
Min, Renqiang
Chakradhar, Srimat
[J]. NEURAL NETWORKS, 2017, 93 : 219 - 229
[9] An efficient, distributed stochastic gradient descent algorithm for deep-learning applications
Cong, Guojing
Bhardwaj, Onkar
Feng, Minwei
[J]. 2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, : 11 - 20
[10] Communication-Efficient Local Stochastic Gradient Descent for Scalable Deep Learning
Lee, Sunwoo
Kang, Qiao
Agrawal, Ankit
Choudhary, Alok
Liao, Wei-keng
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 718 - 727

← 1 2 3 4 5 →