Scaling up stochastic gradient descent for non-convex optimisation

被引:1
|
作者
Mohamad, Saad [1 ]
Alamri, Hamad [2 ]
Bouchachia, Abdelhamid [1 ]
机构
[1] Bournemouth Univ, Dept Comp, Poole, Dorset, England
[2] Univ Warwick, WMG, Coventry, W Midlands, England
基金
欧盟地平线“2020”;
关键词
Stochastic gradient descent; Large scale non-convex optimisation; Distributed and parallel computation; Variational inference; Deep reinforcement learning;
D O I
10.1007/s10994-022-06243-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions and large datasets. We address the bottleneck problem arising when using both shared and distributed memory. Typically, the former is bounded by limited computation resources and bandwidth whereas the latter suffers from communication overheads. We propose a unified distributed and parallel implementation of SGD (named DPSGD) that relies on both asynchronous distribution and lock-free parallelism. By combining two strategies into a unified framework, DPSGD is able to strike a better trade-off between local computation and communication. The convergence properties of DPSGD are studied for non-convex problems such as those arising in statistical modelling and machine learning. Our theoretical analysis shows that DPSGD leads to speed-up with respect to the number of cores and number of workers while guaranteeing an asymptotic convergence rate of O(1/root T) given that the number of cores is bounded by T-1/4 and the number of workers is bounded by T-1/2 where T is the number of iterations. The potential gains that can be achieved by DPSGD are demonstrated empirically on a stochastic variational inference problem (Latent Dirichlet Allocation) and on a deep reinforcement learning (DRL) problem (advantage actor critic - A2C) resulting in two algorithms: DPSVI and HSA2C. Empirical results validate our theoretical findings. Comparative studies are conducted to show the performance of the proposed DPSGD against the state-of-the-art DRL algorithms.
引用
收藏
页码:4039 / 4079
页数:41
相关论文
共 50 条
  • [1] Scaling up stochastic gradient descent for non-convex optimisation
    Saad Mohamad
    Hamad Alamri
    Abdelhamid Bouchachia
    [J]. Machine Learning, 2022, 111 : 4039 - 4079
  • [2] Adaptive Stochastic Gradient Descent Method for Convex and Non-Convex Optimization
    Chen, Ruijuan
    Tang, Xiaoquan
    Li, Xiuting
    [J]. FRACTAL AND FRACTIONAL, 2022, 6 (12)
  • [3] On the Convergence of (Stochastic) Gradient Descent with Extrapolation for Non-Convex Minimization
    Xu, Yi
    Yuan, Zhuoning
    Yang, Sen
    Jin, Rong
    Yang, Tianbao
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4003 - 4009
  • [4] On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems
    Mertikopoulos, Panayotis
    Hallak, Nadav
    Kavis, Ali
    Cevher, Volkan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [5] Evolutionary Gradient Descent for Non-convex Optimization
    Xue, Ke
    Qian, Chao
    Xu, Ling
    Fei, Xudong
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3221 - 3227
  • [6] Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations
    Scaman, Kevin
    Malherbe, Cedric
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [7] Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems
    De Sa, Christopher
    Olukotun, Kunle
    Re, Christopher
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 2332 - 2341
  • [8] Convergence rates for the stochastic gradient descent method for non-convex objective functions
    Fehrman, Benjamin
    Gess, Benjamin
    Jentzen, Arnulf
    [J]. Journal of Machine Learning Research, 2020, 21
  • [9] Convergence Rates for the Stochastic Gradient Descent Method for Non-Convex Objective Functions
    Fehrman, Benjamin
    Gess, Benjamin
    Jentzen, Arnulf
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [10] Efficient Smooth Non-Convex Stochastic Compositional Optimization via Stochastic Recursive Gradient Descent
    Hu, Wenqing
    Li, Chris Junchi
    Lian, Xiangru
    Liu, Ji
    Yuan, Huizhuo
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32