On the convergence and improvement of stochastic normalized gradient descent

被引：0

作者：

Shen-Yi ZHAO ^{[1
]}

Yin-Peng XIE ^{[1
]}

Wu-Jun LI ^{[1
]}

机构：

[1] National Key Laboratory for Novel Software Technology, Department of Computer Science and Technology,Nanjing University

来源：

ScienceChina(InformationSciences) | 2021年 / 64卷 / 03期

关键词：

D O I：

暂无

中图分类号：

TP181 [自动推理、机器学习]; O224 [最优化的数学理论];

学科分类号：

摘要：

Non-convex models, like deep neural networks, have been widely used in machine learning applications. Training non-convex models is a difficult task owing to the saddle points of models. Recently,stochastic normalized gradient descent(SNGD), which updates the model parameter by a normalized gradient in each iteration, has attracted much attention. Existing results show that SNGD can achieve better performance on escaping saddle points than classical training methods like stochastic gradient descent(SGD).However, none of the existing studies has provided theoretical proof about the convergence of SNGD for non-convex problems. In this paper, we firstly prove the convergence of SNGD for non-convex problems.Particularly, we prove that SNGD can achieve the same computation complexity as SGD. In addition, based on our convergence proof of SNGD, we find that SNGD needs to adopt a small constant learning rate for convergence guarantee. This makes SNGD do not perform well on training large non-convex models in practice. Hence, we propose a new method, called stagewise SNGD(S-SNGD), to improve the performance of SNGD. Different from SNGD in which a small constant learning rate is necessary for convergence guarantee,S-SNGD can adopt a large initial learning rate and reduce the learning rate by stage. The convergence of S-SNGD can also be theoretically proved for non-convex problems. Empirical results on deep neural networks show that S-SNGD achieves better performance than SNGD in terms of both training loss and test accuracy.

引用

页码：105 / 117

页数：13

共 50 条

[1] On the convergence and improvement of stochastic normalized gradient descent
Shen-Yi Zhao
Yin-Peng Xie
Wu-Jun Li
Science China Information Sciences, 2021, 64
[2] On the convergence and improvement of stochastic normalized gradient descent
Zhao, Shen-Yi
Xie, Yin-Peng
Li, Wu-Jun
SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (03)
[3] Convergence of Stochastic Gradient Descent for PCA
Shamir, Ohad
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[4] Linear Convergence of Adaptive Stochastic Gradient Descent
Xie, Yuege
Wu, Xiaoxia
Ward, Rachel
arXiv, 2019,
[5] Convergence analysis of gradient descent stochastic algorithms
Shapiro, A
Wardi, Y
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1996, 91 (02) : 439 - 454
[6] On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
Li, Xiaoyu
Orabona, Francesco
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[7] Global Convergence and Stability of Stochastic Gradient Descent
Patel, Vivak
Zhang, Shushu
Tian, Bowen
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[8] Linear Convergence of Adaptive Stochastic Gradient Descent
Xie, Yuege
Wu, Xiaoxia
Ward, Rachel
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[9] Convergence of Stochastic Gradient Descent in Deep Neural Network
Bai-cun Zhou
Cong-ying Han
Tian-de Guo
Acta Mathematicae Applicatae Sinica, English Series, 2021, 37 : 126 - 136
[10] Optimized convergence of stochastic gradient descent by weighted averaging
Hagedorn, Melinda
Jarre, Florian
OPTIMIZATION METHODS & SOFTWARE, 2024, 39 (04): : 699 - 724

← 1 2 3 4 5 →