Convergence of Stochastic Gradient Descent in Deep Neural Network

被引:0
|
作者
Bai-cun Zhou
Cong-ying Han
Tian-de Guo
机构
[1] University of Chinese Academy of Sciences,School of Mathematical Sciences
[2] Chinese Academy of Sciences,Key Laboratory of Big Data Mining and Knowledge Management
关键词
stochastic gradient descent; deep neural network; convergence; 90C06; 90C25; 68W40;
D O I
暂无
中图分类号
学科分类号
摘要
Stochastic gradient descent (SGD) is one of the most common optimization algorithms used in pattern recognition and machine learning. This algorithm and its variants are the preferred algorithm while optimizing parameters of deep neural network for their advantages of low storage space requirement and fast computation speed. Previous studies on convergence of these algorithms were based on some traditional assumptions in optimization problems. However, the deep neural network has its unique properties. Some assumptions are inappropriate in the actual optimization process of this kind of model. In this paper, we modify the assumptions to make them more consistent with the actual optimization process of deep neural network. Based on new assumptions, we studied the convergence and convergence rate of SGD and its two common variant algorithms. In addition, we carried out numerical experiments with LeNet-5, a common network framework, on the data set MNIST to verify the rationality of our assumptions.
引用
收藏
页码:126 / 136
页数:10
相关论文
共 50 条
  • [41] Decentralized Asynchronous Stochastic Gradient Descent: Convergence Rate Analysis
    Bedi, Amrit Singh
    Pradhan, Hrusikesha
    Rajawat, Ketan
    [J]. 2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 402 - 406
  • [42] Fast Convergence for Stochastic and Distributed Gradient Descent in the Interpolation Limit
    Mitra, Partha P.
    [J]. 2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1890 - 1894
  • [43] Implicit Bias of (Stochastic) Gradient Descent for Rank-1 Linear Neural Network
    Lyu, Bochen
    Zhu, Zhanxing
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [44] Stochastic Gradient Descent Combines Second-Order Information for Training Neural Network
    Chen, Minyu
    [J]. ICOMS 2018: 2018 INTERNATIONAL CONFERENCE ON MATHEMATICS AND STATISTICS, 2018, : 69 - 73
  • [45] Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks
    Shamir, Ohad
    [J]. CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [46] Recent Advances in Stochastic Gradient Descent in Deep Learning
    Tian, Yingjie
    Zhang, Yuqi
    Zhang, Haibin
    [J]. MATHEMATICS, 2023, 11 (03)
  • [47] Calibrated Stochastic Gradient Descent for Convolutional Neural Networks
    Zhuo, Li'an
    Zhang, Baochang
    Chen, Chen
    Ye, Qixiang
    Liu, Jianzhuang
    Doermann, David
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9348 - 9355
  • [48] Overparametrized Multi-layer Neural Networks: Uniform Concentration of Neural Tangent Kernel and Convergence of Stochastic Gradient Descent
    Xu, Jiaming
    Zhu, Hanjing
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 83
  • [49] Stochastic Gradient Descent-Whale Optimization Algorithm-Based Deep Convolutional Neural Network To Crowd Emotion Understanding
    Ratre, Avinash
    [J]. COMPUTER JOURNAL, 2020, 63 (02): : 267 - 282
  • [50] Universality of gradient descent neural network training
    Welper, G.
    [J]. NEURAL NETWORKS, 2022, 150 : 259 - 273