A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

被引:0
|
作者
Arnulf Jentzen
Adrian Riekert
机构
[1] The Chinese University of Hong Kong,School of Data Science and Shenzhen Research Institute of Big Data
[2] Shenzhen,Applied Mathematics: Institute for Analysis and Numerics
[3] University of Münster,undefined
关键词
Artificial intelligence; Neural networks; Stochastic gradient descent; Non-convex optimization; 68T99; 41A60; 65D15;
D O I
暂无
中图分类号
学科分类号
摘要
In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergence result the considered artificial neural networks consist of one input layer, one hidden layer, and one output layer (with d∈N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d \in {\mathbb {N}}$$\end{document} neurons on the input layer, H∈N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H\in {\mathbb {N}}$$\end{document} neurons on the hidden layer, and one neuron on the output layer). The learning rates of the SGD process are assumed to be sufficiently small, and the input data used in the SGD process to train the artificial neural networks is assumed to be independent and identically distributed.
引用
收藏
相关论文
共 50 条
  • [1] A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
    Jentzen, Arnulf
    Riekert, Adrian
    ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND PHYSIK, 2022, 73 (05):
  • [2] A proof of convergence for gradient descent in the training of artificial neural networks for constant functions
    Cheridito, Patrick
    Jentzen, Arnulf
    Riekert, Adrian
    Rossmannek, Florian
    JOURNAL OF COMPLEXITY, 2022, 72
  • [3] A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions
    Jentzen, Arnulf
    Riekert, Adrian
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [4] Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation
    Jentzen, Arnulf
    Riekert, Adrian
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2023, 517 (02)
  • [5] Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation
    Eberle, Simon
    Jentzen, Arnulf
    Riekert, Adrian
    Weiss, Georg S.
    ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (05): : 2519 - 2554
  • [6] On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths
    Quynh Nguyen
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [7] Non-convergence of stochastic gradient descent in the training of deep neural networks
    Cheridito, Patrick
    Jentzen, Arnulf
    Rossmannek, Florian
    JOURNAL OF COMPLEXITY, 2021, 64
  • [8] Convergence of Hyperbolic Neural Networks Under Riemannian Stochastic Gradient Descent
    Whiting, Wes
    Wang, Bao
    Xin, Jack
    COMMUNICATIONS ON APPLIED MATHEMATICS AND COMPUTATION, 2024, 6 (02) : 1175 - 1188
  • [9] Damped Newton Stochastic Gradient Descent Method for Neural Networks Training
    Zhou, Jingcheng
    Wei, Wei
    Zhang, Ruizhi
    Zheng, Zhiming
    MATHEMATICS, 2021, 9 (13)
  • [10] Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent
    Holzmueller, David
    Steinwart, Ingo
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23