A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

被引:0
|
作者
Arnulf Jentzen
Adrian Riekert
机构
[1] The Chinese University of Hong Kong,School of Data Science and Shenzhen Research Institute of Big Data
[2] Shenzhen,Applied Mathematics: Institute for Analysis and Numerics
[3] University of Münster,undefined
关键词
Artificial intelligence; Neural networks; Stochastic gradient descent; Non-convex optimization; 68T99; 41A60; 65D15;
D O I
暂无
中图分类号
学科分类号
摘要
In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergence result the considered artificial neural networks consist of one input layer, one hidden layer, and one output layer (with d∈N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d \in {\mathbb {N}}$$\end{document} neurons on the input layer, H∈N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H\in {\mathbb {N}}$$\end{document} neurons on the hidden layer, and one neuron on the output layer). The learning rates of the SGD process are assumed to be sufficiently small, and the input data used in the SGD process to train the artificial neural networks is assumed to be independent and identically distributed.
引用
收藏
相关论文
共 50 条
  • [41] Accelerating deep neural network training with inconsistent stochastic gradient descent
    Wang, Linnan
    Yang, Yi
    Min, Renqiang
    Chakradhar, Srimat
    NEURAL NETWORKS, 2017, 93 : 219 - 229
  • [42] Convergence of gradient descent algorithm with penalty term for recurrent neural networks
    Ding, Xiaoshuai
    Wang, Kuaini
    International Journal of Multimedia and Ubiquitous Engineering, 2014, 9 (09): : 151 - 158
  • [43] A STOCHASTIC TRAINING ALGORITHM FOR ARTIFICIAL NEURAL NETWORKS
    BARTLETT, EB
    NEUROCOMPUTING, 1994, 6 (01) : 31 - 43
  • [44] Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation
    Jentzen, Arnulf
    Welti, Timo
    APPLIED MATHEMATICS AND COMPUTATION, 2023, 455
  • [45] Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
    Vasudevan, Shrihari
    ENTROPY, 2020, 22 (05)
  • [46] Optimizing Deep Neural Networks Through Neuroevolution With Stochastic Gradient Descent
    Zhang, Haichao
    Hao, Kuangrong
    Gao, Lei
    Wei, Bing
    Tang, Xuesong
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (01) : 111 - 121
  • [47] Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks
    Cao, Yuan
    Gu, Quanquan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [48] Stochastic Neural Networks with Monotonic Activation Functions
    Ravanbakhsh, Siamak
    Poczos, Barnabas
    Schneider, Jeff
    Schuurmans, Dale
    Greiner, Russell
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 809 - 818
  • [49] The convergence of stochastic gradient algorithms applied to learning in neural networks
    Stankovic, S
    Tadic, V
    AUTOMATION AND REMOTE CONTROL, 1998, 59 (07) : 1002 - 1015
  • [50] Local Convergence of Gradient Descent-Ascent for Training Generative Adversarial Networks
    Becker, Evan
    Pandit, Parthe
    Rangan, Sundeep
    Fletcher, Alyson K.
    FIFTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, IEEECONF, 2023, : 892 - 896