An automatic learning rate decay strategy for stochastic gradient descent optimization methods in neural networks

被引:10
|
作者
Wang, Kang [1 ]
Dou, Yong [1 ]
Sun, Tao [1 ]
Qiao, Peng [1 ]
Wen, Dong [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Natl Lab Parallel & Distributed Proc, Changsha 410073, Peoples R China
基金
美国国家科学基金会;
关键词
automatic learning rate decay strategy; fast convergence; generalization performance; neural networks; stochastic gradient descent optimization algorithms; training stability; ARCHITECTURES; GAME; GO;
D O I
10.1002/int.22883
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic Gradient Descent (SGD) series optimization methods play the vital role in training neural networks, attracting growing attention in science and engineering fields of the intelligent system. The choice of learning rates affects the convergence rate of SGD series optimization methods. Currently, learning rate adjustment strategies mainly face the following problems: (1) The traditional learning rate decay method mainly adopts manual manner during training iterations, the small learning rate produced from which causes slow convergence in training neural networks. (2) Adaptive method (e.g., Adam) has poor generalization performance. To alleviate the above issues, we propose a novel automatic learning rate decay strategy for SGD optimization methods in neural networks. On the basis of the observation that the convergence rate's upper bound enjoys minimization in a specific iteration concerning the current learning rate, we first present the expression of the current learning rate determined by historical learning rates. And merely one extra parameter is initialized to generate automatic decreasing learning rates during the training process. Our proposed approach is applied to SGD and Momentum SGD optimization algorithms, and concrete theoretical proof explains its convergence. Numerical simulations are conducted on the MNIST and Cifar-10 data sets with different neural networks. Experimental results show that our algorithm outperforms existing classical ones, achieving faster convergence rate, better stability, and generalization performance in neural network training. It also lays a foundation for large-scale parallel search of initial parameters in intelligent systems.
引用
收藏
页码:7334 / 7355
页数:22
相关论文
共 50 条
  • [41] Evaluation of Gradient Descent Optimization: Using Android Applications in Neural Networks
    Alshahrani, Hani
    Alzahrani, Abdulrahman
    Alshehri, Ali
    Alharthi, Raed
    Fu, Huirong
    [J]. PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 1471 - 1476
  • [42] Stochastic Markov gradient descent and training low-bit neural networks
    Ashbrock, Jonathan
    Powell, Alexander M.
    [J]. SAMPLING THEORY SIGNAL PROCESSING AND DATA ANALYSIS, 2021, 19 (02):
  • [43] Non-convergence of stochastic gradient descent in the training of deep neural networks
    Cheridito, Patrick
    Jentzen, Arnulf
    Rossmannek, Florian
    [J]. JOURNAL OF COMPLEXITY, 2021, 64
  • [44] Natural Gradient Descent for Training Stochastic Complex-Valued Neural Networks
    Nitta, Tohru
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 193 - 198
  • [45] Stability analysis of stochastic gradient descent for homogeneous neural networks and linear classifiers
    Paquin, Alexandre Lemire
    Chaib-draa, Brahim
    Giguere, Philippe
    [J]. NEURAL NETWORKS, 2023, 164 : 382 - 394
  • [46] INVERSION OF NEURAL NETWORKS BY GRADIENT DESCENT
    KINDERMANN, J
    LINDEN, A
    [J]. PARALLEL COMPUTING, 1990, 14 (03) : 277 - 286
  • [47] Gradient Descent for Spiking Neural Networks
    Huh, Dongsung
    Sejnowski, Terrence J.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [48] Layer-wise learning based stochastic gradient descent method for the optimization of deep convolutional neural network
    Zheng, Qinghe
    Tian, Xinyu
    Jiang, Nan
    Yang, Mingqiang
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (04) : 5641 - 5654
  • [49] Decentralized Descent Optimization With Stochastic Gradient Signs for Device-to-Device Networks
    Phuong, Tran Thi
    Phong, Le Trieu
    [J]. IEEE WIRELESS COMMUNICATIONS LETTERS, 2021, 10 (09) : 1939 - 1943
  • [50] ANALYSIS OF GRADIENT DESCENT LEARNING ALGORITHMS FOR MULTILAYER FEEDFORWARD NEURAL NETWORKS
    GUO, H
    GELFAND, SB
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, 1991, 38 (08): : 883 - 894