A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

被引：0

作者：

Arnulf Jentzen

Adrian Riekert

机构：

[1] The Chinese University of Hong Kong,School of Data Science and Shenzhen Research Institute of Big Data

[2] Shenzhen,Applied Mathematics: Institute for Analysis and Numerics

[3] University of Münster,undefined

来源：

Zeitschrift für angewandte Mathematik und Physik | 2022年 / 73卷

关键词：

Artificial intelligence; Neural networks; Stochastic gradient descent; Non-convex optimization; 68T99; 41A60; 65D15;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergence result the considered artificial neural networks consist of one input layer, one hidden layer, and one output layer (with d∈N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d \in {\mathbb {N}}$$\end{document} neurons on the input layer, H∈N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H\in {\mathbb {N}}$$\end{document} neurons on the hidden layer, and one neuron on the output layer). The learning rates of the SGD process are assumed to be sufficiently small, and the input data used in the SGD process to train the artificial neural networks is assumed to be independent and identically distributed.

引用

共 50 条

[1] A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
Jentzen, Arnulf
Riekert, Adrian
ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND PHYSIK, 2022, 73 (05):
[2] A proof of convergence for gradient descent in the training of artificial neural networks for constant functions
Cheridito, Patrick
Jentzen, Arnulf
Riekert, Adrian
Rossmannek, Florian
JOURNAL OF COMPLEXITY, 2022, 72
[3] A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions
Jentzen, Arnulf
Riekert, Adrian
JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
[4] Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation
Jentzen, Arnulf
Riekert, Adrian
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2023, 517 (02)
[5] Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation
Eberle, Simon
Jentzen, Arnulf
Riekert, Adrian
Weiss, Georg S.
ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (05): : 2519 - 2554
[6] On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths
Quynh Nguyen
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[7] Non-convergence of stochastic gradient descent in the training of deep neural networks
Cheridito, Patrick
Jentzen, Arnulf
Rossmannek, Florian
JOURNAL OF COMPLEXITY, 2021, 64
[8] Convergence of Hyperbolic Neural Networks Under Riemannian Stochastic Gradient Descent
Whiting, Wes
Wang, Bao
Xin, Jack
COMMUNICATIONS ON APPLIED MATHEMATICS AND COMPUTATION, 2024, 6 (02) : 1175 - 1188
[9] Damped Newton Stochastic Gradient Descent Method for Neural Networks Training
Zhou, Jingcheng
Wei, Wei
Zhang, Ruizhi
Zheng, Zhiming
MATHEMATICS, 2021, 9 (13)
[10] Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent
Holzmueller, David
Steinwart, Ingo
JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23

← 1 2 3 4 5 →