On the diffusion approximation of nonconvex stochastic gradient descent

被引:31
|
作者
Hu, Wenqing [1 ]
Li, Chris Junchi [2 ]
Li, Lei [3 ]
Liu, Jian-Guo [3 ,4 ]
机构
[1] Missouri Univ Sci & Technol, Dept Math & Stat, Rolla, MO USA
[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
[3] Duke Univ, Dept Math, Durham, NC 27708 USA
[4] Duke Univ, Dept Phys, Durham, NC 27708 USA
关键词
Nonconvex optimization; stochastic gradient descent; diffusion approximation; stationary points; batch size; EIGENVALUE; OPERATORS; BEHAVIOR;
D O I
10.4310/AMSA.2019.v4.n1.a1
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
We study the stochastic gradient descent (SGD) method in nonconvex optimization problems from the point of view of approximating diffusion processes. We prove rigorously that the diffusion process can approximate the SGD algorithm weakly using the weak form of master equation for probability evolution. In the small step size regime and the presence of omnidirectional noise, our weak approximating diffusion process suggests the following dynamics for the SGD iteration starting from a local minimizer (resp. saddle point): it escapes in a number of iterations exponentially (resp. almost linearly) dependent on the inverse stepsize. The results are obtained using the theory for random perturbations of dynamical systems (theory of large deviations for local minimizers and theory of exiting for unstable stationary points). In addition, we discuss the effects of batch size for the deep neural networks, and we find that small batch size is helpful for SGD algorithms to escape unstable stationary points and sharp minimizers. Our theory indicates that using small batch size at earlier stage and increasing the batch size at later stage is helpful for the SGD to be trapped in flat minimizers for better generalization.
引用
收藏
页码:3 / 32
页数:30
相关论文
共 50 条
  • [1] ON UNIFORM-IN-TIME DIFFUSION APPROXIMATION FOR STOCHASTIC GRADIENT DESCENT
    Li, Lei
    Wang, Yuliang
    [J]. METHODS AND APPLICATIONS OF ANALYSIS, 2023, 30 (03) : 95 - 112
  • [2] Learning Rates for Stochastic Gradient Descent With Nonconvex Objectives
    Lei, Yunwen
    Tang, Ke
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (12) : 4505 - 4511
  • [3] Stochastic Gradient Descent for Nonconvex Learning Without Bounded Gradient Assumptions
    Lei, Yunwen
    Hu, Ting
    Li, Guiying
    Tang, Ke
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (10) : 4394 - 4400
  • [4] Nonconvex Stochastic Scaled Gradient Descent and Generalized Eigenvector Problems
    Li, Chris Junchi
    Jordan, Michael I.
    [J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 1230 - 1240
  • [5] ON DISTRIBUTED STOCHASTIC GRADIENT DESCENT FOR NONCONVEX FUNCTIONS IN THE PRESENCE OF BYZANTINES
    Bulusu, Saikiran
    Khanduri, Prashant
    Sharma, Pranay
    Varshney, Pramod K.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3137 - 3141
  • [6] High Probability Guarantees for Nonconvex Stochastic Gradient Descent with Heavy Tails
    Li, Shaojie
    Liu, Yong
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [7] Second-Order Guarantees of Stochastic Gradient Descent in Nonconvex Optimization
    Vlaski, Stefan
    Sayed, Ali H.
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (12) : 6489 - 6504
  • [8] pbSGD: Powered Stochastic Gradient Descent Methods for Accelerated Nonconvex Optimization
    Zhou, Beitong
    Liu, Jun
    Sun, Weigao
    Chen, Ruijuan
    Tomlin, Claire
    Yuan, Ye
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3258 - 3266
  • [9] ZEROTH-ORDER STOCHASTIC PROJECTED GRADIENT DESCENT FOR NONCONVEX OPTIMIZATION
    Liu, Sijia
    Li, Xingguo
    Chen, Pin-Yu
    Haupt, Jarvis
    Amini, Lisa
    [J]. 2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 1179 - 1183
  • [10] On Nonconvex Decentralized Gradient Descent
    Zeng, Jinshan
    Yin, Wotao
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2018, 66 (11) : 2834 - 2848