Appropriate Learning Rates of Adaptive Learning Rate Optimization Algorithms for Training Deep Neural Networks

被引：31

作者：

Iiduka, Hideaki ^{[1
]}

机构：

[1] Meiji Univ, Dept Comp Sci, Tokyo, Kanagawa 2148571, Japan

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2022年 / 52卷 / 12期

基金：

日本学术振兴会;

关键词：

Optimization; Convergence; Stochastic processes; Deep learning; Approximation algorithms; Training; Heuristic algorithms; Adaptive mean square gradient (AMSGrad); adaptive moment estimation (Adam); adaptive-learning-rate optimization algorithm; deep neural network; learning rate; nonconvex stochastic optimization; SUBGRADIENT METHODS;

D O I：

10.1109/TCYB.2021.3107415

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article deals with nonconvex stochastic optimization problems in deep learning. Appropriate learning rates, based on theory, for adaptive-learning-rate optimization algorithms (e.g., Adam and AMSGrad) to approximate the stationary points of such problems are provided. These rates are shown to allow faster convergence than previously reported for these algorithms. Specifically, the algorithms are examined in numerical experiments on text and image classification and are shown in experiments to perform better with constant learning rates than algorithms using diminishing learning rates.

引用

页码：13250 / 13261

页数：12

共 50 条

[41] Selecting and Composing Learning Rate Policies for Deep Neural Networks
Wu, Yanzhao
Liu, Ling
[J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (02)
[42] Research advances in deep neural networks learning rate strategies
Liu, Yun-Fei
Zhang, Jun-Ran
[J]. Kongzhi yu Juece/Control and Decision, 2023, 38 (09): : 2444 - 2460
[43] Learning-Rate Annealing Methods for Deep Neural Networks
Nakamura, Kensuke
Derbel, Bilel
Won, Kyoung-Jae
Hong, Byung-Woo
[J]. ELECTRONICS, 2021, 10 (16)
[44] Adaptive learning algorithms to incorporate additional functional constraints into neural networks
Jeong, SY
Lee, SY
[J]. WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL 1, PROCEEDINGS: ISAS '98, 1998, : 574 - 580
[45] Adaptive learning algorithms to incorporate additional functional constraints into neural networks
Jeong, SY
Lee, SY
[J]. NEUROCOMPUTING, 2000, 35 : 73 - 90
[46] A unified framework of online learning algorithms for training recurrent neural networks
Marschall, Owen
Cho, Kyunghyun
Savin, Cristina
[J]. Journal of Machine Learning Research, 2020, 21
[47] Learning dynamics of gradient descent optimization in deep neural networks
Wu, Wei
Jing, Xiaoyuan
Du, Wencai
Chen, Guoliang
[J]. SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (05)
[48] Structure Learning for Deep Neural Networks Based on Multiobjective Optimization
Liu, Jia
Gong, Maoguo
Miao, Qiguang
Wang, Xiaogang
Li, Hao
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (06) : 2450 - 2463
[49] Learning dynamics of gradient descent optimization in deep neural networks
Wei WU
Xiaoyuan JING
Wencai DU
Guoliang CHEN
[J]. Science China(Information Sciences), 2021, 64 (05) : 17 - 31
[50] Gradient-only surrogate to resolve learning rates for robust and consistent training of deep neural networks
Younghwan Chae
Daniel N. Wilke
Dominic Kafka
[J]. Applied Intelligence, 2023, 53 : 13741 - 13762

← 1 2 3 4 5 →