An efficient, distributed stochastic gradient descent algorithm for deep-learning applications

被引:6
|
作者
Cong, Guojing [1 ]
Bhardwaj, Onkar [1 ]
Feng, Minwei [1 ]
机构
[1] IBM TJ Watson Res Ctr, 1101 Kitchawan Rd, Yorktown Hts, NY 10598 USA
关键词
NEURAL-NETWORKS;
D O I
10.1109/ICPP.2017.10
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Parallel and distributed processing is employed to accelerate training for many deep-learning applications with large models and inputs. As it reduces synchronization and communication overhead by tolerating stale gradient updates, asynchronous stochastic gradient descent (ASGD), derived from stochastic gradient descent (SGD), is widely used. Recent theoretical analyses show ASGD converges with linear asymptotic speedup over SGD. Oftentimes glossed over in theoretical analysis are communication overhead and practical learning rates that are critical to the performance of ASGD. After analyzing the communication performance and convergence behavior of ASGD using the Downpour algorithm as an example, we demonstrate the challenges for ASGD to achieve good practical speedup over SGD. We propose a distributed, bulk-synchronous stochastic gradient descent algorithm that allows for sparse gradient aggregation from individual learners. The communication cost is amortized explicitly by a gradient aggregation interval, and global reductions are used instead of a parameter server for gradient aggregation. We prove its convergence and show that it has superior communication performance and convergence behavior over popular ASGD implementations such as Downpour and EAMSGD for deep-learning applications.
引用
收藏
页码:11 / 20
页数:10
相关论文
共 50 条
  • [1] A Hierarchical, bulk-synchronous stochastic gradient descent algorithm for deep-learning applications on GPU clusters
    Cong, Guojing
    Bhardwaj, Onkar
    [J]. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 818 - 821
  • [2] A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning
    Shi, Shaohuai
    Wang, Qiang
    Chu, Xiaowen
    Li, Bo
    [J]. 2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 425 - 432
  • [3] Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models
    Teng, Yunfei
    Gao, Wenbo
    Chalus, Francois
    Choromanska, Anna
    Goldfarb, Donald
    Weller, Adrian
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [4] Deep-learning density functionals for gradient descent optimization
    Costa, E.
    Scriva, G.
    Fazio, R.
    Pilati, S.
    [J]. PHYSICAL REVIEW E, 2022, 106 (04)
  • [5] Generalizing projected gradient descent algorithm for massive MIMO detection based on deep-learning
    Yongming, Huang
    Zheng, Wang
    [J]. Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition), 2024, 54 (04): : 961 - 971
  • [6] A Novel Stochastic Gradient Descent Algorithm Based on Grouping over Heterogeneous Cluster Systems for Distributed Deep Learning
    Jiang, Wenbin
    Ye, Geyan
    Yang, Laurence T.
    Zhu, Jian
    Ma, Yang
    Xie, Xia
    Jin, Hai
    [J]. 2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 391 - 398
  • [7] Communication-Efficient Local Stochastic Gradient Descent for Scalable Deep Learning
    Lee, Sunwoo
    Kang, Qiao
    Agrawal, Ankit
    Choudhary, Alok
    Liao, Wei-keng
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 718 - 727
  • [8] Deep learning for sea cucumber detection using stochastic gradient descent algorithm
    Zhang, Huaqiang
    Yu, Fusheng
    Sun, Jincheng
    Shen, Xiaoqin
    Li, Kun
    [J]. EUROPEAN JOURNAL OF REMOTE SENSING, 2020, 53 (53-62) : 53 - 62
  • [9] Recent Advances in Stochastic Gradient Descent in Deep Learning
    Tian, Yingjie
    Zhang, Yuqi
    Zhang, Haibin
    [J]. MATHEMATICS, 2023, 11 (03)
  • [10] A Modified Stochastic Gradient Descent Optimization Algorithm With Random Learning Rate for Machine Learning and Deep Learning
    Duk-Sun Shim
    Joseph Shim
    [J]. International Journal of Control, Automation and Systems, 2023, 21 : 3825 - 3831