Learning dynamics of gradient descent optimization in deep neural networks

被引：19

作者：

Wu, Wei ^{[1
]}

Jing, Xiaoyuan ^{[1
]}

Du, Wencai ^{[2
]}

Chen, Guoliang ^{[3
]}

机构：

[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China

[2] City Univ Macau, Inst Data Sci, Macau 999078, Peoples R China

[3] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China

来源：

SCIENCE CHINA-INFORMATION SCIENCES | 2021年 / 64卷 / 05期

基金：

中国国家自然科学基金;

关键词：

learning dynamics; deep neural networks; gradient descent; control model; transfer function;

D O I：

10.1007/s11432-020-3163-0

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Stochastic gradient descent (SGD)-based optimizers play a key role in most deep learning models, yet the learning dynamics of the complex model remain obscure. SGD is the basic tool to optimize model parameters, and is improved in many derived forms including SGD momentum and Nesterov accelerated gradient (NAG). However, the learning dynamics of optimizer parameters have seldom been studied. We propose to understand the model dynamics from the perspective of control theory. We use the status transfer function to approximate parameter dynamics for different optimizers as the first- or second-order control system, thus explaining how the parameters theoretically affect the stability and convergence time of deep learning models, and verify our findings by numerical experiments.

引用

页数：15

共 50 条

[31] Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
Vasudevan, Shrihari
ENTROPY, 2020, 22 (05)
[32] Evaluation of Gradient Descent Optimization: Using Android Applications in Neural Networks
Alshahrani, Hani
Alzahrani, Abdulrahman
Alshehri, Ali
Alharthi, Raed
Fu, Huirong
PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 1471 - 1476
[33] INVERSION OF NEURAL NETWORKS BY GRADIENT DESCENT
KINDERMANN, J
LINDEN, A
PARALLEL COMPUTING, 1990, 14 (03) : 277 - 286
[34] Gradient Descent for Spiking Neural Networks
Huh, Dongsung
Sejnowski, Terrence J.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[35] ANALYSIS OF GRADIENT DESCENT LEARNING ALGORITHMS FOR MULTILAYER FEEDFORWARD NEURAL NETWORKS
GUO, H
GELFAND, SB
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, 1991, 38 (08): : 883 - 894
[36] Meta-learning spiking neural networks with surrogate gradient descent
Stewart, Kenneth M.
Neftci, Emre O.
NEUROMORPHIC COMPUTING AND ENGINEERING, 2022, 2 (04):
[37] Annealed Gradient Descent for Deep Learning
Pan, Hengyue
Jiang, Hui
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2015, : 652 - 661
[38] Annealed gradient descent for deep learning
Pan, Hengyue
Niu, Xin
Li, RongChun
Dou, Yong
Jiang, Hui
NEUROCOMPUTING, 2020, 380 (380) : 201 - 211
[39] Non-convergence of stochastic gradient descent in the training of deep neural networks
Cheridito, Patrick
Jentzen, Arnulf
Rossmannek, Florian
JOURNAL OF COMPLEXITY, 2021, 64
[40] Iterative deep neural networks based on proximal gradient descent for image restoration
Lv, Ting
Pan, Zhenkuan
Wei, Weibo
Yang, Guangyu
Song, Jintao
Wang, Xuqing
Sun, Lu
Li, Qian
Sun, Xiatao
PLOS ONE, 2022, 17 (11):

← 1 2 3 4 5 →