A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

被引:0
|
作者
Arnulf Jentzen
Adrian Riekert
机构
[1] The Chinese University of Hong Kong,School of Data Science and Shenzhen Research Institute of Big Data
[2] Shenzhen,Applied Mathematics: Institute for Analysis and Numerics
[3] University of Münster,undefined
关键词
Artificial intelligence; Neural networks; Stochastic gradient descent; Non-convex optimization; 68T99; 41A60; 65D15;
D O I
暂无
中图分类号
学科分类号
摘要
In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergence result the considered artificial neural networks consist of one input layer, one hidden layer, and one output layer (with d∈N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d \in {\mathbb {N}}$$\end{document} neurons on the input layer, H∈N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H\in {\mathbb {N}}$$\end{document} neurons on the hidden layer, and one neuron on the output layer). The learning rates of the SGD process are assumed to be sufficiently small, and the input data used in the SGD process to train the artificial neural networks is assumed to be independent and identically distributed.
引用
收藏
相关论文
共 50 条
  • [21] Gradient Descent Provably Escapes Saddle Points in the Training of Shallow ReLU Networks
    Cheridito, Patrick
    Jentzen, Arnulf
    Rossmannek, Florian
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2024, 203 (03) : 2617 - 2648
  • [22] Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks
    An, Jing
    Lu, Jianfeng
    arXiv, 2023,
  • [23] A Convergence Analysis of Gradient Descent on Graph Neural Networks
    Awasthi, Pranjal
    Das, Abhimanyu
    Gollapudi, Sreenivas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [24] Convergence of gradient descent for learning linear neural networks
    Nguegnang, Gabin Maxime
    Rauhut, Holger
    Terstiege, Ulrich
    ADVANCES IN CONTINUOUS AND DISCRETE MODELS, 2024, 2024 (01):
  • [25] Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions
    Pascal Bianchi
    Walid Hachem
    Sholom Schechtman
    Set-Valued and Variational Analysis, 2022, 30 : 1117 - 1147
  • [26] Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions
    Bianchi, Pascal
    Hachem, Walid
    Schechtman, Sholom
    SET-VALUED AND VARIATIONAL ANALYSIS, 2022, 30 (03) : 1117 - 1147
  • [27] Stochastic Gradient Descent and Anomaly of Variance-Flatness Relation in Artificial Neural Networks
    熊霞
    陈永聪
    石春晓
    敖平
    Chinese Physics Letters, 2023, (08) : 11 - 24
  • [28] Stochastic Gradient Descent and Anomaly of Variance-Flatness Relation in Artificial Neural Networks
    熊霞
    陈永聪
    石春晓
    敖平
    Chinese Physics Letters, 2023, 40 (08) : 11 - 24
  • [29] Stochastic Gradient Descent and Anomaly of Variance-Flatness Relation in Artificial Neural Networks
    Xiong, Xia
    Chen, Yong-Cong
    Shi, Chunxiao
    Ao, Ping
    CHINESE PHYSICS LETTERS, 2023, 40 (08)
  • [30] Calibrated Stochastic Gradient Descent for Convolutional Neural Networks
    Zhuo, Li'an
    Zhang, Baochang
    Chen, Chen
    Ye, Qixiang
    Liu, Jianzhuang
    Doermann, David
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9348 - 9355