Adaptive Learning Rate and Momentum for Training Deep Neural Networks

被引：3

作者：

Hao, Zhiyong ^{[1
]}

Jiang, Yixuan ^{[1
]}

Yu, Huihua ^{[1
]}

Chiang, Hsiao-Dong ^{[1
]}

机构：

[1] Cornell Univ, Ithaca, NY 14850 USA

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT III | 2021年 / 12977卷

关键词：

Optimization algorithm; Line search; Deep learning; CONVERGENCE CONDITIONS; MINIMIZATION;

D O I：

10.1007/978-3-030-86523-8_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent progress on deep learning relies heavily on the quality and efficiency of training algorithms. In this paper, we develop a fast training method motivated by the nonlinear Conjugate Gradient (CG) framework. We propose the Conjugate Gradient with Quadratic line-search (CGQ) method. On the one hand, a quadratic line-search determines the step size according to current loss landscape. On the other hand, the momentum factor is dynamically updated in computing the conjugate gradient parameter (like Polak-Ribiere). Theoretical results to ensure the convergence of our method in strong convex settings is developed. And experiments in image classification datasets show that our method yields faster convergence than other local solvers and has better generalization capability (test set accuracy). One major advantage of the paper method is that tedious hand tuning of hyperparameters like the learning rate and momentum is avoided.

引用

页码：381 / 396

页数：16

共 50 条

[1] Appropriate Learning Rates of Adaptive Learning Rate Optimization Algorithms for Training Deep Neural Networks
Iiduka, Hideaki
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 13250 - 13261
[2] Adaptive Learning Rate for Unsupervised Learning of Deep Neural Networks
Golovko, Vladimir
Mikhno, Egor
Kroschanka, Aliaksandr
Chodyka, Marta
Lichograj, Piotr
[J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[3] A NONMONOTONE LEARNING RATE STRATEGY FOR SGD TRAINING OF DEEP NEURAL NETWORKS
Keskar, Nitish Shirish
Saon, George
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4974 - 4978
[4] CONSTRUCTIVE APPROACHES FOR TRAINING OF WAVELET NEURAL NETWORKS USING ADAPTIVE LEARNING RATE
Skhiri, Mohamed Zine El Abidine
Chtourou, Mohamed
[J]. INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2013, 11 (03)
[5] Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks
Wu, Yanzhao
Liu, Ling
Bae, Juhyun
Chow, Ka-Ho
Iyengar, Arun
Pu, Calton
Wei, Wenqi
Yu, Lei
Zhang, Qi
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 1971 - 1980
[6] AdaSAM: Boosting sharpness-aware minimization with adaptive learning rate and momentum for neural networks
Sun, Hao
Shen, Li
Zhong, Qihuang
Ding, Liang
Chen, Shixiang
Sun, Jingwei
Li, Jing
Sun, Guangzhong
Tao, Dacheng
[J]. NEURAL NETWORKS, 2024, 169 : 506 - 519
[7] A fast adaptive algorithm for training deep neural networks
Gui, Yangting
Li, Dequan
Fang, Runyue
[J]. APPLIED INTELLIGENCE, 2023, 53 (04) : 4099 - 4108
[8] SPEAKER ADAPTIVE TRAINING USING DEEP NEURAL NETWORKS
Ochiai, Tsubasa
Matsuda, Shigeki
Lu, Xugang
Hori, Chiori
Katagiri, Shigeru
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[9] A fast adaptive algorithm for training deep neural networks
Yangting Gui
Dequan Li
Runyue Fang
[J]. Applied Intelligence, 2023, 53 : 4099 - 4108
[10] IMPROVEMENTS TO SPEAKER ADAPTIVE TRAINING OF DEEP NEURAL NETWORKS
Miao, Yajie
Jiang, Lu
Zhang, Hao
Metze, Florian
[J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 165 - 170

← 1 2 3 4 5 →