An automatic learning rate decay strategy for stochastic gradient descent optimization methods in neural networks

被引：10

作者：

Wang, Kang ^{[1
]}

Dou, Yong ^{[1
]}

Sun, Tao ^{[1
]}

Qiao, Peng ^{[1
]}

Wen, Dong ^{[1
]}

机构：

[1] Natl Univ Def Technol, Sch Comp, Natl Lab Parallel & Distributed Proc, Changsha 410073, Peoples R China

来源：

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS | 2022年 / 37卷 / 10期

基金：

美国国家科学基金会;

关键词：

automatic learning rate decay strategy; fast convergence; generalization performance; neural networks; stochastic gradient descent optimization algorithms; training stability; ARCHITECTURES; GAME; GO;

D O I：

10.1002/int.22883

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Stochastic Gradient Descent (SGD) series optimization methods play the vital role in training neural networks, attracting growing attention in science and engineering fields of the intelligent system. The choice of learning rates affects the convergence rate of SGD series optimization methods. Currently, learning rate adjustment strategies mainly face the following problems: (1) The traditional learning rate decay method mainly adopts manual manner during training iterations, the small learning rate produced from which causes slow convergence in training neural networks. (2) Adaptive method (e.g., Adam) has poor generalization performance. To alleviate the above issues, we propose a novel automatic learning rate decay strategy for SGD optimization methods in neural networks. On the basis of the observation that the convergence rate's upper bound enjoys minimization in a specific iteration concerning the current learning rate, we first present the expression of the current learning rate determined by historical learning rates. And merely one extra parameter is initialized to generate automatic decreasing learning rates during the training process. Our proposed approach is applied to SGD and Momentum SGD optimization algorithms, and concrete theoretical proof explains its convergence. Numerical simulations are conducted on the MNIST and Cifar-10 data sets with different neural networks. Experimental results show that our algorithm outperforms existing classical ones, achieving faster convergence rate, better stability, and generalization performance in neural network training. It also lays a foundation for large-scale parallel search of initial parameters in intelligent systems.

引用

页码：7334 / 7355

页数：22

共 50 条

[1] Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
Vasudevan, Shrihari
[J]. ENTROPY, 2020, 22 (05)
[2] Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks
Cui, Xiaodong
Zhang, Wei
Tuske, Zoltan
Picheny, Michael
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[3] Learning dynamics of gradient descent optimization in deep neural networks
Wu, Wei
Jing, Xiaoyuan
Du, Wencai
Chen, Guoliang
[J]. SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (05)
[4] Learning dynamics of gradient descent optimization in deep neural networks
Wei Wu
Xiaoyuan Jing
Wencai Du
Guoliang Chen
[J]. Science China Information Sciences, 2021, 64
[5] Learning dynamics of gradient descent optimization in deep neural networks
Wei WU
Xiaoyuan JING
Wencai DU
Guoliang CHEN
[J]. Science China(Information Sciences), 2021, 64 (05) : 17 - 31
[6] Simple Evolutionary Optimization Can Rival Stochastic Gradient Descent in Neural Networks
Morse, Gregory
Stanley, Kenneth O.
[J]. GECCO'16: PROCEEDINGS OF THE 2016 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2016, : 477 - 484
[7] AUTOMATIC AND SIMULTANEOUS ADJUSTMENT OF LEARNING RATE AND MOMENTUM FOR STOCHASTIC GRADIENT-BASED OPTIMIZATION METHODS
Lancewicki, Tomer
Kopru, Selcuk
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3127 - 3131
[8] Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
Li, Yuanzhi
Liang, Yingyu
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[9] Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks
Veeriah, Vivek
Zhang, Shangtong
Sutton, Richard S.
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT I, 2017, 10534 : 445 - 459
[10] Is Learning in Biological Neural Networks Based on Stochastic Gradient Descent? An Analysis Using Stochastic Processes
Christensen, Soeren
Kallsen, Jan
[J]. NEURAL COMPUTATION, 2024, 36 (07) : 1424 - 1432

← 1 2 3 4 5 →