Comparing dynamics: deep neural networks versus glassy systems

被引:23
|
作者
Baity-Jesi, Marco [1 ,2 ]
Sagun, Levent [3 ,4 ]
Geiger, Mario [4 ]
Spigler, Stefano [3 ,4 ]
Ben Arpus, Gerard [5 ]
Cammarpta, Chiara [6 ]
LeCun, Yann [5 ,7 ,8 ]
Wyart, Matthieu [4 ]
Biroli, Giulio [3 ,9 ]
机构
[1] Eawag, Dept Syst Anal Integrated Assessment & Modelling, Swiss Fed Inst Aquat Sci & Technol, CH-8600 Dubendorf, Switzerland
[2] Columbia Univ, Dept Chem, New York, NY 10027 USA
[3] Univ Paris Saclay, Inst Phys Theor, CEA, CNRS, F-91191 Gif Sur Yvette, France
[4] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[5] NYU, Courant Inst Math Sci, New York, NY USA
[6] Kings Coll London, Dept Math, London WC2R 2LS, England
[7] NYU, Ctr Data Sci, New York, NY USA
[8] Facebook Inc, Facebook AI Res, New York, NY USA
[9] Sorbonne Univ, PSL Res Univ, CNRS, Lab Phys Stat,Ecole Normale Super, F-75005 Paris, France
基金
瑞士国家科学基金会;
关键词
machine learning;
D O I
10.1088/1742-5468/ab3281
中图分类号
O3 [力学];
学科分类号
08 ; 0801 ;
摘要
We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large times, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, showing that the statistical properties of the corresponding loss and energy landscapes are different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Slow dynamics in glassy systems
    Yu, CC
    PHILOSOPHICAL MAGAZINE B-PHYSICS OF CONDENSED MATTER STATISTICAL MECHANICS ELECTRONIC OPTICAL AND MAGNETIC PROPERTIES, 2001, 81 (09): : 1209 - 1223
  • [32] Slow dynamics of glassy systems
    Parisi, G
    PHYSICS OF COMPLEX SYSTEMS, 1997, 134 : 517 - 532
  • [33] GLASSY DYNAMICS IN ICOSAHEDRAL SYSTEMS
    CAFLISCH, RG
    LEVINE, H
    BANAVAR, JR
    PHYSICAL REVIEW LETTERS, 1986, 57 (21) : 2679 - 2682
  • [34] Slow dynamics of glassy systems
    Parisi, G
    COMPLEX BEHAVIOUR OF GLASSY SYSTEMS, 1997, 492 : 111 - 121
  • [35] Critical dynamics in glassy systems
    Parisi, Giorgio
    Rizzo, Tommaso
    PHYSICAL REVIEW E, 2013, 87 (01):
  • [36] Exploring glassy dynamics with Markov state models from graph dynamical neural networks
    Soltani, Siavash
    Sinclair, Chad W.
    Rottler, Joerg
    PHYSICAL REVIEW E, 2022, 106 (02)
  • [37] Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks
    Li, Chunyuan
    Chen, Changyou
    Carlson, David
    Carin, Lawrence
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1788 - 1794
  • [38] Learning dynamics of gradient descent optimization in deep neural networks
    Wu, Wei
    Jing, Xiaoyuan
    Du, Wencai
    Chen, Guoliang
    SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (05)
  • [39] Spontaneous dynamics of neural networks in deep layers of prefrontal cortex
    Blaeser, Andrew S.
    Connors, Barry W.
    Nurmikko, Arto V.
    JOURNAL OF NEUROPHYSIOLOGY, 2017, 117 (04) : 1581 - 1594
  • [40] Sparse deep neural networks for modeling aluminum electrolysis dynamics
    Lundby, Erlend Torje Berg
    Rasheed, Adil
    Gravdahl, Jan Tommy
    Halvorsen, Ivar Johan
    APPLIED SOFT COMPUTING, 2023, 134