Learning dynamics of gradient descent optimization in deep neural networks

被引:19
|
作者
Wu, Wei [1 ]
Jing, Xiaoyuan [1 ]
Du, Wencai [2 ]
Chen, Guoliang [3 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] City Univ Macau, Inst Data Sci, Macau 999078, Peoples R China
[3] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
基金
中国国家自然科学基金;
关键词
learning dynamics; deep neural networks; gradient descent; control model; transfer function;
D O I
10.1007/s11432-020-3163-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stochastic gradient descent (SGD)-based optimizers play a key role in most deep learning models, yet the learning dynamics of the complex model remain obscure. SGD is the basic tool to optimize model parameters, and is improved in many derived forms including SGD momentum and Nesterov accelerated gradient (NAG). However, the learning dynamics of optimizer parameters have seldom been studied. We propose to understand the model dynamics from the perspective of control theory. We use the status transfer function to approximate parameter dynamics for different optimizers as the first- or second-order control system, thus explaining how the parameters theoretically affect the stability and convergence time of deep learning models, and verify our findings by numerical experiments.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Layer-wise learning based stochastic gradient descent method for the optimization of deep convolutional neural network
    Zheng, Qinghe
    Tian, Xinyu
    Jiang, Nan
    Yang, Mingqiang
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (04) : 5641 - 5654
  • [42] Spawning Gradient Descent (SpGD): A Novel Optimization Framework for Machine Learning and Deep Learning
    Moeinoddin Sheikhottayefe
    Zahra Esmaily
    Fereshte Dehghani
    SN Computer Science, 6 (3)
  • [43] Contrastive Learning in Random Neural Networks and its Relation to Gradient-Descent Learning
    Romariz, Alexandre
    Gelenbe, Erol
    COMPUTER AND INFORMATION SCIENCES II, 2012, : 511 - 517
  • [44] Simple Evolutionary Optimization Can Rival Stochastic Gradient Descent in Neural Networks
    Morse, Gregory
    Stanley, Kenneth O.
    GECCO'16: PROCEEDINGS OF THE 2016 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2016, : 477 - 484
  • [45] Gradient and Hamiltonian dynamics applied to learning in neural networks
    Howse, JW
    Abdallah, CT
    Heileman, GL
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 274 - 280
  • [46] Complexity control by gradient descent in deep networks
    Tomaso Poggio
    Qianli Liao
    Andrzej Banburski
    Nature Communications, 11
  • [47] Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
    Li, Yuanzhi
    Liang, Yingyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [48] Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks
    Veeriah, Vivek
    Zhang, Shangtong
    Sutton, Richard S.
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT I, 2017, 10534 : 445 - 459
  • [49] LEARNING SHALLOW NEURAL NETWORKS VIA PROVABLE GRADIENT DESCENT WITH RANDOM INITIALIZATION
    Xia, Shuhao
    Shi, Yuanming
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5616 - 5620
  • [50] Complexity control by gradient descent in deep networks
    Poggio, Tomaso
    Liao, Qianli
    Banburski, Andrzej
    NATURE COMMUNICATIONS, 2020, 11 (01)