Convergence of Stochastic Gradient Descent in Deep Neural Network

被引：0

作者：

Bai-cun Zhou

Cong-ying Han

Tian-de Guo

机构：

[1] University of Chinese Academy of Sciences,School of Mathematical Sciences

[2] Chinese Academy of Sciences,Key Laboratory of Big Data Mining and Knowledge Management

来源：

Acta Mathematicae Applicatae Sinica, English Series | 2021年 / 37卷

关键词：

stochastic gradient descent; deep neural network; convergence; 90C06; 90C25; 68W40;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Stochastic gradient descent (SGD) is one of the most common optimization algorithms used in pattern recognition and machine learning. This algorithm and its variants are the preferred algorithm while optimizing parameters of deep neural network for their advantages of low storage space requirement and fast computation speed. Previous studies on convergence of these algorithms were based on some traditional assumptions in optimization problems. However, the deep neural network has its unique properties. Some assumptions are inappropriate in the actual optimization process of this kind of model. In this paper, we modify the assumptions to make them more consistent with the actual optimization process of deep neural network. Based on new assumptions, we studied the convergence and convergence rate of SGD and its two common variant algorithms. In addition, we carried out numerical experiments with LeNet-5, a common network framework, on the data set MNIST to verify the rationality of our assumptions.

引用

页码：126 / 136

页数：10

共 50 条

[41] Decentralized Asynchronous Stochastic Gradient Descent: Convergence Rate Analysis
Bedi, Amrit Singh
Pradhan, Hrusikesha
Rajawat, Ketan
[J]. 2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018), 2018, : 402 - 406
[42] Fast Convergence for Stochastic and Distributed Gradient Descent in the Interpolation Limit
Mitra, Partha P.
[J]. 2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1890 - 1894
[43] Implicit Bias of (Stochastic) Gradient Descent for Rank-1 Linear Neural Network
Lyu, Bochen
Zhu, Zhanxing
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[44] Stochastic Gradient Descent Combines Second-Order Information for Training Neural Network
Chen, Minyu
[J]. ICOMS 2018: 2018 INTERNATIONAL CONFERENCE ON MATHEMATICS AND STATISTICS, 2018, : 69 - 73
[45] Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks
Shamir, Ohad
[J]. CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
[46] Recent Advances in Stochastic Gradient Descent in Deep Learning
Tian, Yingjie
Zhang, Yuqi
Zhang, Haibin
[J]. MATHEMATICS, 2023, 11 (03)
[47] Calibrated Stochastic Gradient Descent for Convolutional Neural Networks
Zhuo, Li'an
Zhang, Baochang
Chen, Chen
Ye, Qixiang
Liu, Jianzhuang
Doermann, David
[J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9348 - 9355
[48] Overparametrized Multi-layer Neural Networks: Uniform Concentration of Neural Tangent Kernel and Convergence of Stochastic Gradient Descent
Xu, Jiaming
Zhu, Hanjing
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 83
[49] Stochastic Gradient Descent-Whale Optimization Algorithm-Based Deep Convolutional Neural Network To Crowd Emotion Understanding
Ratre, Avinash
[J]. COMPUTER JOURNAL, 2020, 63 (02): : 267 - 282
[50] Universality of gradient descent neural network training
Welper, G.
[J]. NEURAL NETWORKS, 2022, 150 : 259 - 273

← 1 2 3 4 5 →