On the Convergence Rate of Training Recurrent Neural Networks

被引:0
|
作者
Allen-Zhu, Zeyuan [1 ]
Li, Yuanzhi [2 ]
Song, Zhao [3 ]
机构
[1] Microsoft Res AI, Redmond, WA 98052 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] UT Austin, Austin, TX USA
关键词
MODEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How can local-search methods such as stochastic gradient descent (SGD) avoid bad local minima in training multi-layer neural networks? Why can they fit random labels even given non-convex and non-smooth architectures? Most existing theory only covers networks with one hidden layer, so can we go deeper? In this paper, we focus on recurrent neural networks (RNNs) which are multi-layer networks widely used in natural language processing. They are harder to analyze than feedforward neural networks, because the same recurrent unit is repeatedly applied across the entire time horizon of length L, which is analogous to feedforward networks of depth L. We show when the number of neurons is sufficiently large, meaning polynomial in the training data size and in L, then SGD is capable of minimizing the regression loss in the linear convergence rate. This gives theoretical evidence of how RNNs can memorize data. More importantly, in this paper we build general toolkits to analyze multi-layer networks with ReLU activations. For instance, we prove why ReLU activations can prevent exponential gradient explosion or vanishing, and build a perturbation theory to analyze first-order approximation of multi-layer networks.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Convergence analysis of recurrent neural networks
    Dai Yi
    Cong Shuang
    [J]. PROCEEDINGS OF 2004 CHINESE CONTROL AND DECISION CONFERENCE, 2004, : 443 - 447
  • [2] Convergence Study in Extended Kalman Filter-based Training of Recurrent Neural Networks
    Wang, Xiaoyu
    Huang, Yong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (04): : 588 - 600
  • [3] On convergence rate of projection neural networks
    Xia, YS
    Feng, G
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2004, 49 (01) : 91 - 96
  • [4] Convergence of diagonal recurrent neural networks' learning
    Wang, P
    Li, YF
    Feng, S
    Wei, W
    [J]. PROCEEDINGS OF THE 4TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-4, 2002, : 2365 - 2369
  • [5] A CONVERGENCE RESULT FOR LEARNING IN RECURRENT NEURAL NETWORKS
    KUAN, CM
    HORNIK, K
    WHITE, H
    [J]. NEURAL COMPUTATION, 1994, 6 (03) : 420 - 440
  • [6] Convergence result for learning in recurrent neural networks
    Kuan, Chung-Ming
    Hornik, Kurt
    White, Halbert
    [J]. Neural Computation, 1994, 6 (03)
  • [7] Training of a class of recurrent neural networks
    Shaaban, EM
    [J]. ISCAS '98 - PROCEEDINGS OF THE 1998 INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-6, 1998, : B78 - B81
  • [8] Convergence of Adversarial Training in Overparametrized Neural Networks
    Gao, Ruiqi
    Cai, Tianle
    Li, Haochuan
    Wang, Liwei
    Hsieh, Cho-Jui
    Lee, Jason D.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [9] An algorithm for fast convergence in training neural networks
    Wilamowski, BM
    Iplikci, S
    Kaynak, O
    Efe, MÖ
    [J]. IJCNN'01: INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2001, : 1778 - 1782
  • [10] ON THE RATE OF CONVERGENCE IN TOPOLOGY PRESERVING NEURAL NETWORKS
    LO, ZP
    BAVARIAN, B
    [J]. BIOLOGICAL CYBERNETICS, 1991, 65 (01) : 55 - 63