Convergence of Stochastic Gradient Descent in Deep Neural Network

被引:0
|
作者
Bai-cun Zhou
Cong-ying Han
Tian-de Guo
机构
[1] University of Chinese Academy of Sciences,School of Mathematical Sciences
[2] Chinese Academy of Sciences,Key Laboratory of Big Data Mining and Knowledge Management
关键词
stochastic gradient descent; deep neural network; convergence; 90C06; 90C25; 68W40;
D O I
暂无
中图分类号
学科分类号
摘要
Stochastic gradient descent (SGD) is one of the most common optimization algorithms used in pattern recognition and machine learning. This algorithm and its variants are the preferred algorithm while optimizing parameters of deep neural network for their advantages of low storage space requirement and fast computation speed. Previous studies on convergence of these algorithms were based on some traditional assumptions in optimization problems. However, the deep neural network has its unique properties. Some assumptions are inappropriate in the actual optimization process of this kind of model. In this paper, we modify the assumptions to make them more consistent with the actual optimization process of deep neural network. Based on new assumptions, we studied the convergence and convergence rate of SGD and its two common variant algorithms. In addition, we carried out numerical experiments with LeNet-5, a common network framework, on the data set MNIST to verify the rationality of our assumptions.
引用
收藏
页码:126 / 136
页数:10
相关论文
共 50 条
  • [1] Convergence of Stochastic Gradient Descent in Deep Neural Network
    Zhou, Bai-cun
    Han, Cong-ying
    Guo, Tian-de
    [J]. ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2021, 37 (01): : 126 - 136
  • [2] Convergence of Stochastic Gradient Descent in Deep Neural Network
    Bai-cun ZHOU
    Cong-ying HAN
    Tian-de GUO
    [J]. Acta Mathematicae Applicatae Sinica, 2021, 37 (01) : 126 - 136
  • [3] Non-convergence of stochastic gradient descent in the training of deep neural networks
    Cheridito, Patrick
    Jentzen, Arnulf
    Rossmannek, Florian
    [J]. JOURNAL OF COMPLEXITY, 2021, 64
  • [4] Accelerating deep neural network training with inconsistent stochastic gradient descent
    Wang, Linnan
    Yang, Yi
    Min, Renqiang
    Chakradhar, Srimat
    [J]. NEURAL NETWORKS, 2017, 93 : 219 - 229
  • [5] Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks
    An, Jing
    Lu, Jianfeng
    [J]. arXiv, 2023,
  • [6] Improving Training Time of Deep Neural Network With Asynchronous Averaged Stochastic Gradient Descent
    You, Zhao
    Xu, Bo
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 446 - 449
  • [7] Convergence of Stochastic Gradient Descent for PCA
    Shamir, Ohad
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [8] Convergence of Hyperbolic Neural Networks Under Riemannian Stochastic Gradient Descent
    Whiting, Wes
    Wang, Bao
    Xin, Jack
    [J]. COMMUNICATIONS ON APPLIED MATHEMATICS AND COMPUTATION, 2024, 6 (02) : 1175 - 1188
  • [9] Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks
    Cui, Xiaodong
    Zhang, Wei
    Tuske, Zoltan
    Picheny, Michael
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [10] EXPLORING ONE PASS LEARNING FOR DEEP NEURAL NETWORK TRAINING WITH AVERAGED STOCHASTIC GRADIENT DESCENT
    You, Zhao
    Wang, Xiaorui
    Xu, Bo
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,