A Hierarchical, bulk-synchronous stochastic gradient descent algorithm for deep-learning applications on GPU clusters

被引：8

作者：

Cong, Guojing ^{[1
]}

Bhardwaj, Onkar ^{[1
]}

机构：

[1] IBM TJ Watson Res Ctr, 1101 Kitchawan Rd, Yorktown Hts, NY 10598 USA

来源：

2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA) | 2017年

关键词：

D O I：

10.1109/ICMLA.2017.00-56

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The training data and models are becoming increasingly large in many deep-learning applications. Large-scale distributed processing is employed to accelerate training. Increasing the number of learners in synchronous and asynchronous stochastic gradient descent presents challenges to convergence and communication performance. We present our hierarchical, bulk-synchronous stochastic gradient algorithm that effectively balances execution time and accuracy for training in deep-learning applications on GPU clusters. It achieves much better convergence and execution time at scale in comparison to asynchronous stochastic gradient descent implementations. When deployed on a cluster of 128 GPUs, our implementation achieves up to 56 times speedups over the sequential stochastic gradient descent with similar test accuracy for our target application.

引用

页码：818 / 821

页数：4

共 42 条

[1] An efficient, distributed stochastic gradient descent algorithm for deep-learning applications
Cong, Guojing
Bhardwaj, Onkar
Feng, Minwei
[J]. 2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, : 11 - 20
[2] A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning
Shi, Shaohuai
Wang, Qiang
Chu, Xiaowen
Li, Bo
[J]. 2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 425 - 432
[3] Adaptive Stochastic Gradient Descent for Deep Learning on Heterogeneous CPU plus GPU Architectures
Ma, Yujing
Rusu, Florin
Wu, Kesheng
Sim, Alexander
[J]. 2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 6 - 15
[4] Deep-learning density functionals for gradient descent optimization
Costa, E.
Scriva, G.
Fazio, R.
Pilati, S.
[J]. PHYSICAL REVIEW E, 2022, 106 (04)
[5] Deep learning for sea cucumber detection using stochastic gradient descent algorithm
Zhang, Huaqiang
Yu, Fusheng
Sun, Jincheng
Shen, Xiaoqin
Li, Kun
[J]. EUROPEAN JOURNAL OF REMOTE SENSING, 2020, 53 (sup1) : 53 - 62
[6] A Modified Stochastic Gradient Descent Optimization Algorithm With Random Learning Rate for Machine Learning and Deep Learning
Duk-Sun Shim
Joseph Shim
[J]. International Journal of Control, Automation and Systems, 2023, 21 : 3825 - 3831
[7] Recent Advances in Stochastic Gradient Descent in Deep Learning
Tian, Yingjie
Zhang, Yuqi
Zhang, Haibin
[J]. MATHEMATICS, 2023, 11 (03)
[8] A Modified Stochastic Gradient Descent Optimization Algorithm With Random Learning Rate for Machine Learning and Deep Learning
Shim, Duk-Sun
Shim, Joseph
[J]. INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2023, 21 (11) : 3825 - 3831
[9] GPUSGD: A GPU-accelerated stochastic gradient descent algorithm for matrix factorization
Jin, Jing
Lai, Siyan
Hu, Su
Lin, Jing
Lin, Xiaola
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (14): : 3844 - 3865
[10] Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning
Guo, Pengzhan
Ye, Zeyang
Xiao, Keli
Zhu, Wei
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (10) : 5037 - 5050

← 1 2 3 4 5 →