A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

被引：0

作者：

Arnulf Jentzen

Adrian Riekert

机构：

[1] The Chinese University of Hong Kong,School of Data Science and Shenzhen Research Institute of Big Data

[2] Shenzhen,Applied Mathematics: Institute for Analysis and Numerics

[3] University of Münster,undefined

来源：

Zeitschrift für angewandte Mathematik und Physik | 2022年 / 73卷

关键词：

Artificial intelligence; Neural networks; Stochastic gradient descent; Non-convex optimization; 68T99; 41A60; 65D15;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergence result the considered artificial neural networks consist of one input layer, one hidden layer, and one output layer (with d∈N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d \in {\mathbb {N}}$$\end{document} neurons on the input layer, H∈N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H\in {\mathbb {N}}$$\end{document} neurons on the hidden layer, and one neuron on the output layer). The learning rates of the SGD process are assumed to be sufficiently small, and the input data used in the SGD process to train the artificial neural networks is assumed to be independent and identically distributed.

引用

共 50 条

[21] Gradient Descent Provably Escapes Saddle Points in the Training of Shallow ReLU Networks
Cheridito, Patrick
Jentzen, Arnulf
Rossmannek, Florian
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2024, 203 (03) : 2617 - 2648
[22] Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks
An, Jing
Lu, Jianfeng
arXiv, 2023,
[23] A Convergence Analysis of Gradient Descent on Graph Neural Networks
Awasthi, Pranjal
Das, Abhimanyu
Gollapudi, Sreenivas
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[24] Convergence of gradient descent for learning linear neural networks
Nguegnang, Gabin Maxime
Rauhut, Holger
Terstiege, Ulrich
ADVANCES IN CONTINUOUS AND DISCRETE MODELS, 2024, 2024 (01):
[25] Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions
Pascal Bianchi
Walid Hachem
Sholom Schechtman
Set-Valued and Variational Analysis, 2022, 30 : 1117 - 1147
[26] Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions
Bianchi, Pascal
Hachem, Walid
Schechtman, Sholom
SET-VALUED AND VARIATIONAL ANALYSIS, 2022, 30 (03) : 1117 - 1147
[27] Stochastic Gradient Descent and Anomaly of Variance-Flatness Relation in Artificial Neural Networks
熊霞
陈永聪
石春晓
敖平
Chinese Physics Letters, 2023, (08) : 11 - 24
[28] Stochastic Gradient Descent and Anomaly of Variance-Flatness Relation in Artificial Neural Networks
熊霞
陈永聪
石春晓
敖平
Chinese Physics Letters, 2023, 40 (08) : 11 - 24
[29] Stochastic Gradient Descent and Anomaly of Variance-Flatness Relation in Artificial Neural Networks
Xiong, Xia
Chen, Yong-Cong
Shi, Chunxiao
Ao, Ping
CHINESE PHYSICS LETTERS, 2023, 40 (08)
[30] Calibrated Stochastic Gradient Descent for Convolutional Neural Networks
Zhuo, Li'an
Zhang, Baochang
Chen, Chen
Ye, Qixiang
Liu, Jianzhuang
Doermann, David
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9348 - 9355

← 1 2 3 4 5 →