Adaptive Learning Rate and Momentum for Training Deep Neural Networks

被引:3
|
作者
Hao, Zhiyong [1 ]
Jiang, Yixuan [1 ]
Yu, Huihua [1 ]
Chiang, Hsiao-Dong [1 ]
机构
[1] Cornell Univ, Ithaca, NY 14850 USA
关键词
Optimization algorithm; Line search; Deep learning; CONVERGENCE CONDITIONS; MINIMIZATION;
D O I
10.1007/978-3-030-86523-8_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent progress on deep learning relies heavily on the quality and efficiency of training algorithms. In this paper, we develop a fast training method motivated by the nonlinear Conjugate Gradient (CG) framework. We propose the Conjugate Gradient with Quadratic line-search (CGQ) method. On the one hand, a quadratic line-search determines the step size according to current loss landscape. On the other hand, the momentum factor is dynamically updated in computing the conjugate gradient parameter (like Polak-Ribiere). Theoretical results to ensure the convergence of our method in strong convex settings is developed. And experiments in image classification datasets show that our method yields faster convergence than other local solvers and has better generalization capability (test set accuracy). One major advantage of the paper method is that tedious hand tuning of hyperparameters like the learning rate and momentum is avoided.
引用
收藏
页码:381 / 396
页数:16
相关论文
共 50 条
  • [1] Appropriate Learning Rates of Adaptive Learning Rate Optimization Algorithms for Training Deep Neural Networks
    Iiduka, Hideaki
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 13250 - 13261
  • [2] Adaptive Learning Rate for Unsupervised Learning of Deep Neural Networks
    Golovko, Vladimir
    Mikhno, Egor
    Kroschanka, Aliaksandr
    Chodyka, Marta
    Lichograj, Piotr
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [3] A NONMONOTONE LEARNING RATE STRATEGY FOR SGD TRAINING OF DEEP NEURAL NETWORKS
    Keskar, Nitish Shirish
    Saon, George
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4974 - 4978
  • [4] CONSTRUCTIVE APPROACHES FOR TRAINING OF WAVELET NEURAL NETWORKS USING ADAPTIVE LEARNING RATE
    Skhiri, Mohamed Zine El Abidine
    Chtourou, Mohamed
    [J]. INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2013, 11 (03)
  • [5] Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks
    Wu, Yanzhao
    Liu, Ling
    Bae, Juhyun
    Chow, Ka-Ho
    Iyengar, Arun
    Pu, Calton
    Wei, Wenqi
    Yu, Lei
    Zhang, Qi
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 1971 - 1980
  • [6] AdaSAM: Boosting sharpness-aware minimization with adaptive learning rate and momentum for neural networks
    Sun, Hao
    Shen, Li
    Zhong, Qihuang
    Ding, Liang
    Chen, Shixiang
    Sun, Jingwei
    Li, Jing
    Sun, Guangzhong
    Tao, Dacheng
    [J]. NEURAL NETWORKS, 2024, 169 : 506 - 519
  • [7] A fast adaptive algorithm for training deep neural networks
    Gui, Yangting
    Li, Dequan
    Fang, Runyue
    [J]. APPLIED INTELLIGENCE, 2023, 53 (04) : 4099 - 4108
  • [8] SPEAKER ADAPTIVE TRAINING USING DEEP NEURAL NETWORKS
    Ochiai, Tsubasa
    Matsuda, Shigeki
    Lu, Xugang
    Hori, Chiori
    Katagiri, Shigeru
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] A fast adaptive algorithm for training deep neural networks
    Yangting Gui
    Dequan Li
    Runyue Fang
    [J]. Applied Intelligence, 2023, 53 : 4099 - 4108
  • [10] IMPROVEMENTS TO SPEAKER ADAPTIVE TRAINING OF DEEP NEURAL NETWORKS
    Miao, Yajie
    Jiang, Lu
    Zhang, Hao
    Metze, Florian
    [J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 165 - 170