A Hierarchical, bulk-synchronous stochastic gradient descent algorithm for deep-learning applications on GPU clusters

被引:8
|
作者
Cong, Guojing [1 ]
Bhardwaj, Onkar [1 ]
机构
[1] IBM TJ Watson Res Ctr, 1101 Kitchawan Rd, Yorktown Hts, NY 10598 USA
关键词
D O I
10.1109/ICMLA.2017.00-56
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The training data and models are becoming increasingly large in many deep-learning applications. Large-scale distributed processing is employed to accelerate training. Increasing the number of learners in synchronous and asynchronous stochastic gradient descent presents challenges to convergence and communication performance. We present our hierarchical, bulk-synchronous stochastic gradient algorithm that effectively balances execution time and accuracy for training in deep-learning applications on GPU clusters. It achieves much better convergence and execution time at scale in comparison to asynchronous stochastic gradient descent implementations. When deployed on a cluster of 128 GPUs, our implementation achieves up to 56 times speedups over the sequential stochastic gradient descent with similar test accuracy for our target application.
引用
收藏
页码:818 / 821
页数:4
相关论文
共 42 条
  • [1] An efficient, distributed stochastic gradient descent algorithm for deep-learning applications
    Cong, Guojing
    Bhardwaj, Onkar
    Feng, Minwei
    [J]. 2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, : 11 - 20
  • [2] A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning
    Shi, Shaohuai
    Wang, Qiang
    Chu, Xiaowen
    Li, Bo
    [J]. 2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 425 - 432
  • [3] Adaptive Stochastic Gradient Descent for Deep Learning on Heterogeneous CPU plus GPU Architectures
    Ma, Yujing
    Rusu, Florin
    Wu, Kesheng
    Sim, Alexander
    [J]. 2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 6 - 15
  • [4] Deep-learning density functionals for gradient descent optimization
    Costa, E.
    Scriva, G.
    Fazio, R.
    Pilati, S.
    [J]. PHYSICAL REVIEW E, 2022, 106 (04)
  • [5] Deep learning for sea cucumber detection using stochastic gradient descent algorithm
    Zhang, Huaqiang
    Yu, Fusheng
    Sun, Jincheng
    Shen, Xiaoqin
    Li, Kun
    [J]. EUROPEAN JOURNAL OF REMOTE SENSING, 2020, 53 (sup1) : 53 - 62
  • [6] A Modified Stochastic Gradient Descent Optimization Algorithm With Random Learning Rate for Machine Learning and Deep Learning
    Duk-Sun Shim
    Joseph Shim
    [J]. International Journal of Control, Automation and Systems, 2023, 21 : 3825 - 3831
  • [7] Recent Advances in Stochastic Gradient Descent in Deep Learning
    Tian, Yingjie
    Zhang, Yuqi
    Zhang, Haibin
    [J]. MATHEMATICS, 2023, 11 (03)
  • [8] A Modified Stochastic Gradient Descent Optimization Algorithm With Random Learning Rate for Machine Learning and Deep Learning
    Shim, Duk-Sun
    Shim, Joseph
    [J]. INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2023, 21 (11) : 3825 - 3831
  • [9] GPUSGD: A GPU-accelerated stochastic gradient descent algorithm for matrix factorization
    Jin, Jing
    Lai, Siyan
    Hu, Su
    Lin, Jing
    Lin, Xiaola
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (14): : 3844 - 3865
  • [10] Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning
    Guo, Pengzhan
    Ye, Zeyang
    Xiao, Keli
    Zhu, Wei
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (10) : 5037 - 5050