A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

被引：0

作者：

Arnulf Jentzen

Adrian Riekert

机构：

[1] The Chinese University of Hong Kong,School of Data Science and Shenzhen Research Institute of Big Data

[2] Shenzhen,Applied Mathematics: Institute for Analysis and Numerics

[3] University of Münster,undefined

来源：

Zeitschrift für angewandte Mathematik und Physik | 2022年 / 73卷

关键词：

Artificial intelligence; Neural networks; Stochastic gradient descent; Non-convex optimization; 68T99; 41A60; 65D15;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergence result the considered artificial neural networks consist of one input layer, one hidden layer, and one output layer (with d∈N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d \in {\mathbb {N}}$$\end{document} neurons on the input layer, H∈N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H\in {\mathbb {N}}$$\end{document} neurons on the hidden layer, and one neuron on the output layer). The learning rates of the SGD process are assumed to be sufficiently small, and the input data used in the SGD process to train the artificial neural networks is assumed to be independent and identically distributed.

引用

共 50 条

[41] Accelerating deep neural network training with inconsistent stochastic gradient descent
Wang, Linnan
Yang, Yi
Min, Renqiang
Chakradhar, Srimat
NEURAL NETWORKS, 2017, 93 : 219 - 229
[42] Convergence of gradient descent algorithm with penalty term for recurrent neural networks
Ding, Xiaoshuai
Wang, Kuaini
International Journal of Multimedia and Ubiquitous Engineering, 2014, 9 (09): : 151 - 158
[43] A STOCHASTIC TRAINING ALGORITHM FOR ARTIFICIAL NEURAL NETWORKS
BARTLETT, EB
NEUROCOMPUTING, 1994, 6 (01) : 31 - 43
[44] Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation
Jentzen, Arnulf
Welti, Timo
APPLIED MATHEMATICS AND COMPUTATION, 2023, 455
[45] Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
Vasudevan, Shrihari
ENTROPY, 2020, 22 (05)
[46] Optimizing Deep Neural Networks Through Neuroevolution With Stochastic Gradient Descent
Zhang, Haichao
Hao, Kuangrong
Gao, Lei
Wei, Bing
Tang, Xuesong
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (01) : 111 - 121
[47] Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks
Cao, Yuan
Gu, Quanquan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[48] Stochastic Neural Networks with Monotonic Activation Functions
Ravanbakhsh, Siamak
Poczos, Barnabas
Schneider, Jeff
Schuurmans, Dale
Greiner, Russell
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 809 - 818
[49] The convergence of stochastic gradient algorithms applied to learning in neural networks
Stankovic, S
Tadic, V
AUTOMATION AND REMOTE CONTROL, 1998, 59 (07) : 1002 - 1015
[50] Local Convergence of Gradient Descent-Ascent for Training Generative Adversarial Networks
Becker, Evan
Pandit, Parthe
Rangan, Sundeep
Fletcher, Alyson K.
FIFTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, IEEECONF, 2023, : 892 - 896

← 1 2 3 4 5 →