Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

被引:0
|
作者
Lian, Xiangru [1 ]
Huang, Yijun [1 ]
Li, Yuncheng [1 ]
Liu, Ji [1 ]
机构
[1] Univ Rochester, Dept Comp Sci, Rochester, NY 14627 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and speedup properties, mainly due to the nonconvexity of most deep learning formulations and the asynchronous parallel mechanism. To fill the gaps in theory and provide theoretical supports, this paper studies two asynchronous parallel implementations of SG: one is over a computer network and the other is on a shared memory system. We establish an ergodic convergence rate O (1/root K) for both algorithms and prove that the linear speedup is achievable if the number of workers is bounded by root K (K is the total number of iterations). Our results generalize and improve existing analysis for convex minimization.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization
    Fang, Cong
    Lin, Zhouchen
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 794 - 800
  • [2] Asynchronous parallel algorithms for nonconvex optimization
    Cannelli, Loris
    Facchinei, Francisco
    Kungurtsev, Vyacheslav
    Scutari, Gesualdo
    [J]. MATHEMATICAL PROGRAMMING, 2020, 184 (1-2) : 121 - 154
  • [3] Asynchronous parallel algorithms for nonconvex optimization
    Loris Cannelli
    Francisco Facchinei
    Vyacheslav Kungurtsev
    Gesualdo Scutari
    [J]. Mathematical Programming, 2020, 184 : 121 - 154
  • [4] Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization
    Akyildiz, Omer Deniz
    Crisan, Dan
    Miguez, Joaquin
    [J]. STATISTICS AND COMPUTING, 2020, 30 (06) : 1645 - 1663
  • [5] Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization
    Ömer Deniz Akyildiz
    Dan Crisan
    Joaquín Míguez
    [J]. Statistics and Computing, 2020, 30 : 1645 - 1663
  • [6] Decentralized Asynchronous Nonconvex Stochastic Optimization on Directed Graphs
    Kungurtsev, Vyacheslav
    Morafah, Mahdi
    Javidi, Tara
    Scutari, Gesualdo
    [J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (04): : 1796 - 1804
  • [7] ASYNCHRONOUS PARALLEL NONCONVEX LARGE-SCALE OPTIMIZATION
    Cannelli, L.
    Facchinei, F.
    Kungurtsev, V.
    Scutari, G.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4706 - 4710
  • [8] Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization
    Horvath, Samuel
    Lei, Lihua
    Richtarik, Peter
    Jordan, Michael I.
    [J]. SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2022, 4 (02): : 634 - 648
  • [9] Asynchronous Decentralized Parallel Stochastic Gradient Descent
    Lian, Xiangru
    Zhang, Wei
    Zhang, Ce
    Liu, Ji
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [10] Stochastic generalized gradient method for nonconvex nonsmooth stochastic optimization
    Yu. M. Ermol'ev
    V. I. Norkin
    [J]. Cybernetics and Systems Analysis, 1998, 34 : 196 - 215