On the convergence and improvement of stochastic normalized gradient descent

被引:0
|
作者
Shen-Yi ZHAO [1 ]
Yin-Peng XIE [1 ]
Wu-Jun LI [1 ]
机构
[1] National Key Laboratory for Novel Software Technology, Department of Computer Science and Technology,Nanjing University
关键词
D O I
暂无
中图分类号
TP181 [自动推理、机器学习]; O224 [最优化的数学理论];
学科分类号
摘要
Non-convex models, like deep neural networks, have been widely used in machine learning applications. Training non-convex models is a difficult task owing to the saddle points of models. Recently,stochastic normalized gradient descent(SNGD), which updates the model parameter by a normalized gradient in each iteration, has attracted much attention. Existing results show that SNGD can achieve better performance on escaping saddle points than classical training methods like stochastic gradient descent(SGD).However, none of the existing studies has provided theoretical proof about the convergence of SNGD for non-convex problems. In this paper, we firstly prove the convergence of SNGD for non-convex problems.Particularly, we prove that SNGD can achieve the same computation complexity as SGD. In addition, based on our convergence proof of SNGD, we find that SNGD needs to adopt a small constant learning rate for convergence guarantee. This makes SNGD do not perform well on training large non-convex models in practice. Hence, we propose a new method, called stagewise SNGD(S-SNGD), to improve the performance of SNGD. Different from SNGD in which a small constant learning rate is necessary for convergence guarantee,S-SNGD can adopt a large initial learning rate and reduce the learning rate by stage. The convergence of S-SNGD can also be theoretically proved for non-convex problems. Empirical results on deep neural networks show that S-SNGD achieves better performance than SNGD in terms of both training loss and test accuracy.
引用
收藏
页码:105 / 117
页数:13
相关论文
共 50 条
  • [41] On the Convergence of Stochastic Gradient Descent for Linear Inverse Problems in Banach Spaces
    Jin, Bangti
    Kereta, Zeljko
    SIAM JOURNAL ON IMAGING SCIENCES, 2023, 16 (02): : 671 - 705
  • [42] On the Convergence of (Stochastic) Gradient Descent with Extrapolation for Non-Convex Minimization
    Xu, Yi
    Yuan, Zhuoning
    Yang, Sen
    Jin, Rong
    Yang, Tianbao
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4003 - 4009
  • [43] SUBLINEAR CONVERGENCE OF A TAMED STOCHASTIC GRADIENT DESCENT METHOD IN HILBERT SPACE
    Eisenmann M.
    Stillfjord T.
    SIAM Journal on Computing, 2022, 51 (04)
  • [44] Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors
    Nitanda, Atsushi
    Suzuki, Taiji
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [45] Nonlinear Optimization Method Based on Stochastic Gradient Descent for Fast Convergence
    Watanabe, Takahiro
    Iima, Hitoshi
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 4198 - 4203
  • [46] On the Convergence of Stochastic Gradient Descent with Bandwidth-based Step Size
    Wang, Xiaoyu
    Yuan, Ya-xiang
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [47] ON THE CONVERGENCE OF STOCHASTIC GRADIENT DESCENT FOR NONLINEAR ILL-POSED PROBLEMS
    Jin, Bangti
    Zhou, Zehui
    Zou, Jun
    SIAM JOURNAL ON OPTIMIZATION, 2020, 30 (02) : 1421 - 1450
  • [48] A generalized normalized gradient descent algorithm
    Mandic, DP
    IEEE SIGNAL PROCESSING LETTERS, 2004, 11 (02) : 115 - 118
  • [49] ON THE CONVERGENCE OF DECENTRALIZED GRADIENT DESCENT
    Yuan, Kun
    Ling, Qing
    Yin, Wotao
    SIAM JOURNAL ON OPTIMIZATION, 2016, 26 (03) : 1835 - 1854
  • [50] Non-convergence of stochastic gradient descent in the training of deep neural networks
    Cheridito, Patrick
    Jentzen, Arnulf
    Rossmannek, Florian
    JOURNAL OF COMPLEXITY, 2021, 64