An automatic learning rate decay strategy for stochastic gradient descent optimization methods in neural networks

被引:10
|
作者
Wang, Kang [1 ]
Dou, Yong [1 ]
Sun, Tao [1 ]
Qiao, Peng [1 ]
Wen, Dong [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Natl Lab Parallel & Distributed Proc, Changsha 410073, Peoples R China
基金
美国国家科学基金会;
关键词
automatic learning rate decay strategy; fast convergence; generalization performance; neural networks; stochastic gradient descent optimization algorithms; training stability; ARCHITECTURES; GAME; GO;
D O I
10.1002/int.22883
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic Gradient Descent (SGD) series optimization methods play the vital role in training neural networks, attracting growing attention in science and engineering fields of the intelligent system. The choice of learning rates affects the convergence rate of SGD series optimization methods. Currently, learning rate adjustment strategies mainly face the following problems: (1) The traditional learning rate decay method mainly adopts manual manner during training iterations, the small learning rate produced from which causes slow convergence in training neural networks. (2) Adaptive method (e.g., Adam) has poor generalization performance. To alleviate the above issues, we propose a novel automatic learning rate decay strategy for SGD optimization methods in neural networks. On the basis of the observation that the convergence rate's upper bound enjoys minimization in a specific iteration concerning the current learning rate, we first present the expression of the current learning rate determined by historical learning rates. And merely one extra parameter is initialized to generate automatic decreasing learning rates during the training process. Our proposed approach is applied to SGD and Momentum SGD optimization algorithms, and concrete theoretical proof explains its convergence. Numerical simulations are conducted on the MNIST and Cifar-10 data sets with different neural networks. Experimental results show that our algorithm outperforms existing classical ones, achieving faster convergence rate, better stability, and generalization performance in neural network training. It also lays a foundation for large-scale parallel search of initial parameters in intelligent systems.
引用
收藏
页码:7334 / 7355
页数:22
相关论文
共 50 条
  • [11] Calibrated Stochastic Gradient Descent for Convolutional Neural Networks
    Zhuo, Li'an
    Zhang, Baochang
    Chen, Chen
    Ye, Qixiang
    Liu, Jianzhuang
    Doermann, David
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9348 - 9355
  • [12] Stochastic Gradient Descent with Polyak's Learning Rate
    Prazeres, Mariana
    Oberman, Adam M.
    JOURNAL OF SCIENTIFIC COMPUTING, 2021, 89 (01)
  • [13] Stochastic Gradient Descent with Polyak’s Learning Rate
    Mariana Prazeres
    Adam M. Oberman
    Journal of Scientific Computing, 2021, 89
  • [14] A Modified Stochastic Gradient Descent Optimization Algorithm With Random Learning Rate for Machine Learning and Deep Learning
    Duk-Sun Shim
    Joseph Shim
    International Journal of Control, Automation and Systems, 2023, 21 : 3825 - 3831
  • [15] A Modified Stochastic Gradient Descent Optimization Algorithm With Random Learning Rate for Machine Learning and Deep Learning
    Shim, Duk-Sun
    Shim, Joseph
    INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2023, 21 (11) : 3825 - 3831
  • [16] Optimization of Graph Neural Networks with Natural Gradient Descent
    Izadi, Mohammad Rasool
    Fang, Yihao
    Stevenson, Robert
    Lin, Lizhen
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 171 - 179
  • [17] Efficient Optimization of Neural Networks for Predictive Hiring: An In-Depth Approach to Stochastic Gradient Descent
    Temsamani, Yassine Khallouk
    Achchab, Said
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 588 - 594
  • [18] Learning Graph Neural Networks with Approximate Gradient Descent
    Li, Qunwei
    Zou, Shaofeng
    Zhong, Wenliang
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8438 - 8446
  • [19] Gradient descent learning for quaternionic Hopfield neural networks
    Kobayashi, Masaki
    NEUROCOMPUTING, 2017, 260 : 174 - 179
  • [20] Convergence of gradient descent for learning linear neural networks
    Nguegnang, Gabin Maxime
    Rauhut, Holger
    Terstiege, Ulrich
    ADVANCES IN CONTINUOUS AND DISCRETE MODELS, 2024, 2024 (01):