Stochastic Gradient Descent and Anomaly of Variance-Flatness Relation in Artificial Neural Networks

被引:1
|
作者
Xiong, Xia [1 ]
Chen, Yong-Cong [1 ]
Shi, Chunxiao [1 ]
Ao, Ping [2 ]
机构
[1] Shanghai Univ, Shanghai Ctr Quantitat Life Sci, Phys Dept, Shanghai 200444, Peoples R China
[2] Sichuan Univ, Coll Biomed Engn, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Compendex;
D O I
10.1088/0256-307X/40/8/080202
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Stochastic gradient descent (SGD), a widely used algorithm in deep-learning neural networks, has attracted continuing research interests for the theoretical principles behind its success. A recent work reported an anomaly (inverse) relation between the variance of neural weights and the landscape flatness of the loss function driven under SGD [Feng Y and Tu Y Proc. Natl. Acad. Sci. USA 118 e2015617118 (2021)]. To investigate this seeming violation of statistical physics principle, the properties of SGD near fixed points are analyzed with a dynamic decomposition method. Our approach recovers the true "energy" function under which the universal Boltzmann distribution holds. It differs from the cost function in general and resolves the paradox raised by the the anomaly. The study bridges the gap between the classical statistical mechanics and the emerging discipline of artificial intelligence, with potential for better algorithms to the latter.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Natural Gradient Descent for Training Stochastic Complex-Valued Neural Networks
    Nitta, Tohru
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 193 - 198
  • [22] Is Learning in Biological Neural Networks Based on Stochastic Gradient Descent? An Analysis Using Stochastic Processes
    Christensen, Soeren
    Kallsen, Jan
    NEURAL COMPUTATION, 2024, 36 (07) : 1424 - 1432
  • [23] Stability analysis of stochastic gradient descent for homogeneous neural networks and linear classifiers
    Paquin, Alexandre Lemire
    Chaib-draa, Brahim
    Giguere, Philippe
    NEURAL NETWORKS, 2023, 164 : 382 - 394
  • [24] Implicit Stochastic Gradient Descent for Training Physics-Informed Neural Networks
    Li, Ye
    Chen, Song-Can
    Huang, Sheng-Jun
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8692 - 8700
  • [25] INVERSION OF NEURAL NETWORKS BY GRADIENT DESCENT
    KINDERMANN, J
    LINDEN, A
    PARALLEL COMPUTING, 1990, 14 (03) : 277 - 286
  • [26] Gradient Descent for Spiking Neural Networks
    Huh, Dongsung
    Sejnowski, Terrence J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [27] Forecasting the productivity of a solar distiller enhanced with an inclined absorber plate using stochastic gradient descent in artificial neural networks
    Mohammed, Suha A.
    Al-Haddad, Luttfi A.
    Alawee, Wissam H.
    Dhahad, Hayder A.
    Jaber, Alaa Abdulhady
    Al-Haddad, Sinan A.
    MULTISCALE AND MULTIDISCIPLINARY MODELING EXPERIMENTS AND DESIGN, 2024, 7 (03) : 1819 - 1829
  • [28] Variance-Reduced Stochastic Gradient Descent on Streaming Data
    Jothimurugesan, Ellango
    Tahmasbi, Ashraf
    Gibbons, Phillip B.
    Tirthapura, Srikanta
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [29] On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants
    Reddi, Sashank J.
    Hefny, Ahmed
    Sra, Suvrit
    Poczos, Barnabas
    Smola, Alex
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [30] A proof of convergence for gradient descent in the training of artificial neural networks for constant functions
    Cheridito, Patrick
    Jentzen, Arnulf
    Riekert, Adrian
    Rossmannek, Florian
    JOURNAL OF COMPLEXITY, 2022, 72