Convergence of Stochastic Gradient Descent in Deep Neural Network

被引：0

作者：

Bai-cun Zhou

Cong-ying Han

Tian-de Guo

机构：

[1] University of Chinese Academy of Sciences,School of Mathematical Sciences

[2] Chinese Academy of Sciences,Key Laboratory of Big Data Mining and Knowledge Management

来源：

Acta Mathematicae Applicatae Sinica, English Series | 2021年 / 37卷

关键词：

stochastic gradient descent; deep neural network; convergence; 90C06; 90C25; 68W40;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Stochastic gradient descent (SGD) is one of the most common optimization algorithms used in pattern recognition and machine learning. This algorithm and its variants are the preferred algorithm while optimizing parameters of deep neural network for their advantages of low storage space requirement and fast computation speed. Previous studies on convergence of these algorithms were based on some traditional assumptions in optimization problems. However, the deep neural network has its unique properties. Some assumptions are inappropriate in the actual optimization process of this kind of model. In this paper, we modify the assumptions to make them more consistent with the actual optimization process of deep neural network. Based on new assumptions, we studied the convergence and convergence rate of SGD and its two common variant algorithms. In addition, we carried out numerical experiments with LeNet-5, a common network framework, on the data set MNIST to verify the rationality of our assumptions.

引用

页码：126 / 136

页数：10

共 50 条

[1] Convergence of Stochastic Gradient Descent in Deep Neural Network
Zhou, Bai-cun
Han, Cong-ying
Guo, Tian-de
[J]. ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2021, 37 (01): : 126 - 136
[2] Convergence of Stochastic Gradient Descent in Deep Neural Network
Bai-cun ZHOU
Cong-ying HAN
Tian-de GUO
[J]. Acta Mathematicae Applicatae Sinica, 2021, 37 (01) : 126 - 136
[3] Non-convergence of stochastic gradient descent in the training of deep neural networks
Cheridito, Patrick
Jentzen, Arnulf
Rossmannek, Florian
[J]. JOURNAL OF COMPLEXITY, 2021, 64
[4] Accelerating deep neural network training with inconsistent stochastic gradient descent
Wang, Linnan
Yang, Yi
Min, Renqiang
Chakradhar, Srimat
[J]. NEURAL NETWORKS, 2017, 93 : 219 - 229
[5] Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks
An, Jing
Lu, Jianfeng
[J]. arXiv, 2023,
[6] Improving Training Time of Deep Neural Network With Asynchronous Averaged Stochastic Gradient Descent
You, Zhao
Xu, Bo
[J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 446 - 449
[7] Convergence of Stochastic Gradient Descent for PCA
Shamir, Ohad
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[8] Convergence of Hyperbolic Neural Networks Under Riemannian Stochastic Gradient Descent
Whiting, Wes
Wang, Bao
Xin, Jack
[J]. COMMUNICATIONS ON APPLIED MATHEMATICS AND COMPUTATION, 2024, 6 (02) : 1175 - 1188
[9] Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks
Cui, Xiaodong
Zhang, Wei
Tuske, Zoltan
Picheny, Michael
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[10] EXPLORING ONE PASS LEARNING FOR DEEP NEURAL NETWORK TRAINING WITH AVERAGED STOCHASTIC GRADIENT DESCENT
You, Zhao
Wang, Xiaorui
Xu, Bo
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,

← 1 2 3 4 5 →