On the Convergence Rate of Training Recurrent Neural Networks

被引：0

作者：

Allen-Zhu, Zeyuan ^{[1
]}

Li, Yuanzhi ^{[2
]}

Song, Zhao ^{[3
]}

机构：

[1] Microsoft Res AI, Redmond, WA 98052 USA

[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[3] UT Austin, Austin, TX USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

MODEL;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

How can local-search methods such as stochastic gradient descent (SGD) avoid bad local minima in training multi-layer neural networks? Why can they fit random labels even given non-convex and non-smooth architectures? Most existing theory only covers networks with one hidden layer, so can we go deeper? In this paper, we focus on recurrent neural networks (RNNs) which are multi-layer networks widely used in natural language processing. They are harder to analyze than feedforward neural networks, because the same recurrent unit is repeatedly applied across the entire time horizon of length L, which is analogous to feedforward networks of depth L. We show when the number of neurons is sufficiently large, meaning polynomial in the training data size and in L, then SGD is capable of minimizing the regression loss in the linear convergence rate. This gives theoretical evidence of how RNNs can memorize data. More importantly, in this paper we build general toolkits to analyze multi-layer networks with ReLU activations. For instance, we prove why ReLU activations can prevent exponential gradient explosion or vanishing, and build a perturbation theory to analyze first-order approximation of multi-layer networks.

引用

页数：13

共 50 条

[1] Convergence analysis of recurrent neural networks
Dai Yi
Cong Shuang
[J]. PROCEEDINGS OF 2004 CHINESE CONTROL AND DECISION CONFERENCE, 2004, : 443 - 447
[2] Convergence Study in Extended Kalman Filter-based Training of Recurrent Neural Networks
Wang, Xiaoyu
Huang, Yong
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (04): : 588 - 600
[3] On convergence rate of projection neural networks
Xia, YS
Feng, G
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2004, 49 (01) : 91 - 96
[4] Convergence of diagonal recurrent neural networks' learning
Wang, P
Li, YF
Feng, S
Wei, W
[J]. PROCEEDINGS OF THE 4TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-4, 2002, : 2365 - 2369
[5] A CONVERGENCE RESULT FOR LEARNING IN RECURRENT NEURAL NETWORKS
KUAN, CM
HORNIK, K
WHITE, H
[J]. NEURAL COMPUTATION, 1994, 6 (03) : 420 - 440
[6] Convergence result for learning in recurrent neural networks
Kuan, Chung-Ming
Hornik, Kurt
White, Halbert
[J]. Neural Computation, 1994, 6 (03)
[7] Training of a class of recurrent neural networks
Shaaban, EM
[J]. ISCAS '98 - PROCEEDINGS OF THE 1998 INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-6, 1998, : B78 - B81
[8] Convergence of Adversarial Training in Overparametrized Neural Networks
Gao, Ruiqi
Cai, Tianle
Li, Haochuan
Wang, Liwei
Hsieh, Cho-Jui
Lee, Jason D.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[9] An algorithm for fast convergence in training neural networks
Wilamowski, BM
Iplikci, S
Kaynak, O
Efe, MÖ
[J]. IJCNN'01: INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2001, : 1778 - 1782
[10] ON THE RATE OF CONVERGENCE IN TOPOLOGY PRESERVING NEURAL NETWORKS
LO, ZP
BAVARIAN, B
[J]. BIOLOGICAL CYBERNETICS, 1991, 65 (01) : 55 - 63

← 1 2 3 4 5 →