On the convergence and improvement of stochastic normalized gradient descent

被引:0
|
作者
Shen-Yi ZHAO [1 ]
Yin-Peng XIE [1 ]
Wu-Jun LI [1 ]
机构
[1] National Key Laboratory for Novel Software Technology, Department of Computer Science and Technology,Nanjing University
关键词
D O I
暂无
中图分类号
TP181 [自动推理、机器学习]; O224 [最优化的数学理论];
学科分类号
摘要
Non-convex models, like deep neural networks, have been widely used in machine learning applications. Training non-convex models is a difficult task owing to the saddle points of models. Recently,stochastic normalized gradient descent(SNGD), which updates the model parameter by a normalized gradient in each iteration, has attracted much attention. Existing results show that SNGD can achieve better performance on escaping saddle points than classical training methods like stochastic gradient descent(SGD).However, none of the existing studies has provided theoretical proof about the convergence of SNGD for non-convex problems. In this paper, we firstly prove the convergence of SNGD for non-convex problems.Particularly, we prove that SNGD can achieve the same computation complexity as SGD. In addition, based on our convergence proof of SNGD, we find that SNGD needs to adopt a small constant learning rate for convergence guarantee. This makes SNGD do not perform well on training large non-convex models in practice. Hence, we propose a new method, called stagewise SNGD(S-SNGD), to improve the performance of SNGD. Different from SNGD in which a small constant learning rate is necessary for convergence guarantee,S-SNGD can adopt a large initial learning rate and reduce the learning rate by stage. The convergence of S-SNGD can also be theoretically proved for non-convex problems. Empirical results on deep neural networks show that S-SNGD achieves better performance than SNGD in terms of both training loss and test accuracy.
引用
收藏
页码:105 / 117
页数:13
相关论文
共 50 条
  • [31] Normalized stochastic gradient descent learning of general complex-valued models
    Paireder, T.
    Motz, C.
    Huemer, M.
    ELECTRONICS LETTERS, 2021, 57 (12) : 493 - 495
  • [32] Learning in neural networks by normalized stochastic gradient algorithm: Local convergence
    Tadic, V
    Stankovic, S
    NEUREL 2000: PROCEEDINGS OF THE 5TH SEMINAR ON NEURAL NETWORK APPLICATIONS IN ELECTRICAL ENGINEERING, 2000, : 11 - 17
  • [33] Almost sure convergence rates of stochastic proximal gradient descent algorithm
    Liang, Yuqing
    Xu, Dongpo
    OPTIMIZATION, 2024, 73 (08) : 2413 - 2446
  • [34] Convergence Analysis of Accelerated Stochastic Gradient Descent Under the Growth Condition
    Chen, You-Lin
    Na, Sen
    Kolar, Mladen
    MATHEMATICS OF OPERATIONS RESEARCH, 2024, 49 (04) : 2492 - 2526
  • [35] Distributed Stochastic Gradient Descent: Nonconvexity, Nonsmoothness, and Convergence to Local Minima
    Swenson, Brian
    Murray, Ryan
    Poor, H. Vincent
    Kar, Soummya
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [36] A SHARP CONVERGENCE RATE FOR A MODEL EQUATION OF THE ASYNCHRONOUS STOCHASTIC GRADIENT DESCENT
    Zhu, Yuhua
    Ying, Lexing
    COMMUNICATIONS IN MATHEMATICAL SCIENCES, 2021, 19 (03) : 851 - 863
  • [37] SUBLINEAR CONVERGENCE OF A TAMED STOCHASTIC GRADIENT DESCENT METHOD IN HILBERT SPACE
    Eisenmann, Monika
    Stillfjord, Tony
    SIAM JOURNAL ON OPTIMIZATION, 2022, 32 (03) : 1642 - 1667
  • [38] Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance
    Wang, Hongjian
    Gurbuzbalaban, Mert
    Zhu, Lingjiong
    Simsekli, Umut
    Erdogdu, Murat A.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [39] Convergence of Hyperbolic Neural Networks Under Riemannian Stochastic Gradient Descent
    Whiting, Wes
    Wang, Bao
    Xin, Jack
    COMMUNICATIONS ON APPLIED MATHEMATICS AND COMPUTATION, 2024, 6 (02) : 1175 - 1188
  • [40] Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency
    Deng, Yuyang
    Mandavi, Mehrdad
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130