Stochastic Gradient Descent and Anomaly of Variance-Flatness Relation in Artificial Neural Networks

被引：1

作者：

Xiong, Xia ^{[1
]}

Chen, Yong-Cong ^{[1
]}

Shi, Chunxiao ^{[1
]}

Ao, Ping ^{[2
]}

机构：

[1] Shanghai Univ, Shanghai Ctr Quantitat Life Sci, Phys Dept, Shanghai 200444, Peoples R China

[2] Sichuan Univ, Coll Biomed Engn, Chengdu 610065, Peoples R China

来源：

CHINESE PHYSICS LETTERS | 2023年 / 40卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Compendex;

D O I：

10.1088/0256-307X/40/8/080202

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

Stochastic gradient descent (SGD), a widely used algorithm in deep-learning neural networks, has attracted continuing research interests for the theoretical principles behind its success. A recent work reported an anomaly (inverse) relation between the variance of neural weights and the landscape flatness of the loss function driven under SGD [Feng Y and Tu Y Proc. Natl. Acad. Sci. USA 118 e2015617118 (2021)]. To investigate this seeming violation of statistical physics principle, the properties of SGD near fixed points are analyzed with a dynamic decomposition method. Our approach recovers the true "energy" function under which the universal Boltzmann distribution holds. It differs from the cost function in general and resolves the paradox raised by the the anomaly. The study bridges the gap between the classical statistical mechanics and the emerging discipline of artificial intelligence, with potential for better algorithms to the latter.

引用

页数：5

共 50 条

[21] Natural Gradient Descent for Training Stochastic Complex-Valued Neural Networks
Nitta, Tohru
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 193 - 198
[22] Is Learning in Biological Neural Networks Based on Stochastic Gradient Descent? An Analysis Using Stochastic Processes
Christensen, Soeren
Kallsen, Jan
NEURAL COMPUTATION, 2024, 36 (07) : 1424 - 1432
[23] Stability analysis of stochastic gradient descent for homogeneous neural networks and linear classifiers
Paquin, Alexandre Lemire
Chaib-draa, Brahim
Giguere, Philippe
NEURAL NETWORKS, 2023, 164 : 382 - 394
[24] Implicit Stochastic Gradient Descent for Training Physics-Informed Neural Networks
Li, Ye
Chen, Song-Can
Huang, Sheng-Jun
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8692 - 8700
[25] INVERSION OF NEURAL NETWORKS BY GRADIENT DESCENT
KINDERMANN, J
LINDEN, A
PARALLEL COMPUTING, 1990, 14 (03) : 277 - 286
[26] Gradient Descent for Spiking Neural Networks
Huh, Dongsung
Sejnowski, Terrence J.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[27] Forecasting the productivity of a solar distiller enhanced with an inclined absorber plate using stochastic gradient descent in artificial neural networks
Mohammed, Suha A.
Al-Haddad, Luttfi A.
Alawee, Wissam H.
Dhahad, Hayder A.
Jaber, Alaa Abdulhady
Al-Haddad, Sinan A.
MULTISCALE AND MULTIDISCIPLINARY MODELING EXPERIMENTS AND DESIGN, 2024, 7 (03) : 1819 - 1829
[28] Variance-Reduced Stochastic Gradient Descent on Streaming Data
Jothimurugesan, Ellango
Tahmasbi, Ashraf
Gibbons, Phillip B.
Tirthapura, Srikanta
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[29] On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants
Reddi, Sashank J.
Hefny, Ahmed
Sra, Suvrit
Poczos, Barnabas
Smola, Alex
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[30] A proof of convergence for gradient descent in the training of artificial neural networks for constant functions
Cheridito, Patrick
Jentzen, Arnulf
Riekert, Adrian
Rossmannek, Florian
JOURNAL OF COMPLEXITY, 2022, 72

← 1 2 3 4 5 →