Appropriate Learning Rates of Adaptive Learning Rate Optimization Algorithms for Training Deep Neural Networks

被引:31
|
作者
Iiduka, Hideaki [1 ]
机构
[1] Meiji Univ, Dept Comp Sci, Tokyo, Kanagawa 2148571, Japan
基金
日本学术振兴会;
关键词
Optimization; Convergence; Stochastic processes; Deep learning; Approximation algorithms; Training; Heuristic algorithms; Adaptive mean square gradient (AMSGrad); adaptive moment estimation (Adam); adaptive-learning-rate optimization algorithm; deep neural network; learning rate; nonconvex stochastic optimization; SUBGRADIENT METHODS;
D O I
10.1109/TCYB.2021.3107415
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article deals with nonconvex stochastic optimization problems in deep learning. Appropriate learning rates, based on theory, for adaptive-learning-rate optimization algorithms (e.g., Adam and AMSGrad) to approximate the stationary points of such problems are provided. These rates are shown to allow faster convergence than previously reported for these algorithms. Specifically, the algorithms are examined in numerical experiments on text and image classification and are shown in experiments to perform better with constant learning rates than algorithms using diminishing learning rates.
引用
收藏
页码:13250 / 13261
页数:12
相关论文
共 50 条
  • [41] Selecting and Composing Learning Rate Policies for Deep Neural Networks
    Wu, Yanzhao
    Liu, Ling
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (02)
  • [42] Research advances in deep neural networks learning rate strategies
    Liu, Yun-Fei
    Zhang, Jun-Ran
    [J]. Kongzhi yu Juece/Control and Decision, 2023, 38 (09): : 2444 - 2460
  • [43] Learning-Rate Annealing Methods for Deep Neural Networks
    Nakamura, Kensuke
    Derbel, Bilel
    Won, Kyoung-Jae
    Hong, Byung-Woo
    [J]. ELECTRONICS, 2021, 10 (16)
  • [44] Adaptive learning algorithms to incorporate additional functional constraints into neural networks
    Jeong, SY
    Lee, SY
    [J]. WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL 1, PROCEEDINGS: ISAS '98, 1998, : 574 - 580
  • [45] Adaptive learning algorithms to incorporate additional functional constraints into neural networks
    Jeong, SY
    Lee, SY
    [J]. NEUROCOMPUTING, 2000, 35 : 73 - 90
  • [46] A unified framework of online learning algorithms for training recurrent neural networks
    Marschall, Owen
    Cho, Kyunghyun
    Savin, Cristina
    [J]. Journal of Machine Learning Research, 2020, 21
  • [47] Learning dynamics of gradient descent optimization in deep neural networks
    Wu, Wei
    Jing, Xiaoyuan
    Du, Wencai
    Chen, Guoliang
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (05)
  • [48] Structure Learning for Deep Neural Networks Based on Multiobjective Optimization
    Liu, Jia
    Gong, Maoguo
    Miao, Qiguang
    Wang, Xiaogang
    Li, Hao
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (06) : 2450 - 2463
  • [49] Learning dynamics of gradient descent optimization in deep neural networks
    Wei WU
    Xiaoyuan JING
    Wencai DU
    Guoliang CHEN
    [J]. Science China(Information Sciences), 2021, 64 (05) : 17 - 31
  • [50] Gradient-only surrogate to resolve learning rates for robust and consistent training of deep neural networks
    Younghwan Chae
    Daniel N. Wilke
    Dominic Kafka
    [J]. Applied Intelligence, 2023, 53 : 13741 - 13762