Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient Descent

被引:0
|
作者
Shen, Shuheng [1 ,2 ]
Xu, Linli [1 ,2 ]
Liu, Jingchang [1 ,2 ]
Liang, Xianfeng [1 ,2 ]
Cheng, Yifei [1 ,3 ]
机构
[1] Univ Sci & Technol China, Anhui Prov Key Lab Big Data Anal & Applicat, Hefei, Anhui, Peoples R China
[2] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei, Anhui, Peoples R China
[3] Univ Sci & Technol China, Sch Data Sci, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the increase in the amount of data and the expansion of model scale, distributed parallel training becomes an important and successful technique to address the optimization challenges. Nevertheless, although distributed stochastic gradient descent (SGD) algorithms can achieve a linear iteration speedup, they are limited significantly in practice by the communication cost, making it difficult to achieve a linear time speedup. In this paper, we propose a computation and communication decoupled stochastic gradient descent (CoCoD-SGD) algorithm to run computation and communication in parallel to reduce the communication cost. We prove that CoCoD-SGD has a linear iteration speedup with respect to the total computation capability of the hardware resources. In addition, it has a lower communication complexity and better time speedup comparing with traditional distributed SGD algorithms. Experiments on deep neural network training demonstrate the significant improvements of CoCoD-SGD: when training ResNetl8 and VGG16 with 16 Geforce GTX 1080Ti GPUs, CoCoD-SGD is up to 2-3 x faster than traditional synchronous SGD.
引用
收藏
页码:4582 / 4589
页数:8
相关论文
共 50 条
  • [1] Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models
    Teng, Yunfei
    Gao, Wenbo
    Chalus, Francois
    Choromanska, Anna
    Goldfarb, Donald
    Weller, Adrian
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [2] An asynchronous distributed training algorithm based on Gossip communication and Stochastic Gradient Descent
    Tu, Jun
    Zhou, Jia
    Ren, Donglin
    [J]. COMPUTER COMMUNICATIONS, 2022, 195 : 416 - 423
  • [3] Distributed Stochastic Gradient Descent With Compressed and Skipped Communication
    Phuong, Tran Thi
    Phong, Le Trieu
    Fukushima, Kazuhide
    [J]. IEEE ACCESS, 2023, 11 : 99836 - 99846
  • [4] Communication-Censored Distributed Stochastic Gradient Descent
    Li, Weiyu
    Wu, Zhaoxian
    Chen, Tianyi
    Li, Liping
    Ling, Qing
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) : 6831 - 6843
  • [5] Distributed Stochastic Gradient Descent with Event-Triggered Communication
    George, Jemin
    Gurram, Prudhvi
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7169 - 7178
  • [6] A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning
    Shi, Shaohuai
    Wang, Qiang
    Chu, Xiaowen
    Li, Bo
    [J]. 2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 425 - 432
  • [7] Bayesian Distributed Stochastic Gradient Descent
    Teng, Michael
    Wood, Frank
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [8] Accelerating deep neural network training with inconsistent stochastic gradient descent
    Wang, Linnan
    Yang, Yi
    Min, Renqiang
    Chakradhar, Srimat
    [J]. NEURAL NETWORKS, 2017, 93 : 219 - 229
  • [9] An efficient, distributed stochastic gradient descent algorithm for deep-learning applications
    Cong, Guojing
    Bhardwaj, Onkar
    Feng, Minwei
    [J]. 2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, : 11 - 20
  • [10] Communication-Efficient Local Stochastic Gradient Descent for Scalable Deep Learning
    Lee, Sunwoo
    Kang, Qiao
    Agrawal, Ankit
    Choudhary, Alok
    Liao, Wei-keng
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 718 - 727