An efficient, distributed stochastic gradient descent algorithm for deep-learning applications

被引：6

作者：

Cong, Guojing ^{[1
]}

Bhardwaj, Onkar ^{[1
]}

Feng, Minwei ^{[1
]}

机构：

[1] IBM TJ Watson Res Ctr, 1101 Kitchawan Rd, Yorktown Hts, NY 10598 USA

来源：

2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP) | 2017年

关键词：

NEURAL-NETWORKS;

D O I：

10.1109/ICPP.2017.10

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Parallel and distributed processing is employed to accelerate training for many deep-learning applications with large models and inputs. As it reduces synchronization and communication overhead by tolerating stale gradient updates, asynchronous stochastic gradient descent (ASGD), derived from stochastic gradient descent (SGD), is widely used. Recent theoretical analyses show ASGD converges with linear asymptotic speedup over SGD. Oftentimes glossed over in theoretical analysis are communication overhead and practical learning rates that are critical to the performance of ASGD. After analyzing the communication performance and convergence behavior of ASGD using the Downpour algorithm as an example, we demonstrate the challenges for ASGD to achieve good practical speedup over SGD. We propose a distributed, bulk-synchronous stochastic gradient descent algorithm that allows for sparse gradient aggregation from individual learners. The communication cost is amortized explicitly by a gradient aggregation interval, and global reductions are used instead of a parameter server for gradient aggregation. We prove its convergence and show that it has superior communication performance and convergence behavior over popular ASGD implementations such as Downpour and EAMSGD for deep-learning applications.

引用

页码：11 / 20

页数：10

共 50 条

[1] A Hierarchical, bulk-synchronous stochastic gradient descent algorithm for deep-learning applications on GPU clusters
Cong, Guojing
Bhardwaj, Onkar
[J]. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 818 - 821
[2] A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning
Shi, Shaohuai
Wang, Qiang
Chu, Xiaowen
Li, Bo
[J]. 2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 425 - 432
[3] Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models
Teng, Yunfei
Gao, Wenbo
Chalus, Francois
Choromanska, Anna
Goldfarb, Donald
Weller, Adrian
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[4] Deep-learning density functionals for gradient descent optimization
Costa, E.
Scriva, G.
Fazio, R.
Pilati, S.
[J]. PHYSICAL REVIEW E, 2022, 106 (04)
[5] Generalizing projected gradient descent algorithm for massive MIMO detection based on deep-learning
Yongming, Huang
Zheng, Wang
[J]. Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition), 2024, 54 (04): : 961 - 971
[6] A Novel Stochastic Gradient Descent Algorithm Based on Grouping over Heterogeneous Cluster Systems for Distributed Deep Learning
Jiang, Wenbin
Ye, Geyan
Yang, Laurence T.
Zhu, Jian
Ma, Yang
Xie, Xia
Jin, Hai
[J]. 2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 391 - 398
[7] Communication-Efficient Local Stochastic Gradient Descent for Scalable Deep Learning
Lee, Sunwoo
Kang, Qiao
Agrawal, Ankit
Choudhary, Alok
Liao, Wei-keng
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 718 - 727
[8] Deep learning for sea cucumber detection using stochastic gradient descent algorithm
Zhang, Huaqiang
Yu, Fusheng
Sun, Jincheng
Shen, Xiaoqin
Li, Kun
[J]. EUROPEAN JOURNAL OF REMOTE SENSING, 2020, 53 (53-62) : 53 - 62
[9] Recent Advances in Stochastic Gradient Descent in Deep Learning
Tian, Yingjie
Zhang, Yuqi
Zhang, Haibin
[J]. MATHEMATICS, 2023, 11 (03)
[10] A Modified Stochastic Gradient Descent Optimization Algorithm With Random Learning Rate for Machine Learning and Deep Learning
Duk-Sun Shim
Joseph Shim
[J]. International Journal of Control, Automation and Systems, 2023, 21 : 3825 - 3831

← 1 2 3 4 5 →