Comparing dynamics: deep neural networks versus glassy systems

被引:23
|
作者
Baity-Jesi, Marco [1 ,2 ]
Sagun, Levent [3 ,4 ]
Geiger, Mario [4 ]
Spigler, Stefano [3 ,4 ]
Ben Arpus, Gerard [5 ]
Cammarpta, Chiara [6 ]
LeCun, Yann [5 ,7 ,8 ]
Wyart, Matthieu [4 ]
Biroli, Giulio [3 ,9 ]
机构
[1] Eawag, Dept Syst Anal Integrated Assessment & Modelling, Swiss Fed Inst Aquat Sci & Technol, CH-8600 Dubendorf, Switzerland
[2] Columbia Univ, Dept Chem, New York, NY 10027 USA
[3] Univ Paris Saclay, Inst Phys Theor, CEA, CNRS, F-91191 Gif Sur Yvette, France
[4] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[5] NYU, Courant Inst Math Sci, New York, NY USA
[6] Kings Coll London, Dept Math, London WC2R 2LS, England
[7] NYU, Ctr Data Sci, New York, NY USA
[8] Facebook Inc, Facebook AI Res, New York, NY USA
[9] Sorbonne Univ, PSL Res Univ, CNRS, Lab Phys Stat,Ecole Normale Super, F-75005 Paris, France
基金
瑞士国家科学基金会;
关键词
machine learning;
D O I
10.1088/1742-5468/ab3281
中图分类号
O3 [力学];
学科分类号
08 ; 0801 ;
摘要
We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large times, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, showing that the statistical properties of the corresponding loss and energy landscapes are different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Learning dynamics of gradient descent optimization in deep neural networks
    Wei WU
    Xiaoyuan JING
    Wencai DU
    Guoliang CHEN
    ScienceChina(InformationSciences), 2021, 64 (05) : 17 - 31
  • [42] Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks
    Ye, Nanyang
    Zhu, Zhanxing
    Mantiuk, Rafal K.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [43] Curvature-corrected learning dynamics in deep neural networks
    Huh, Dongsung
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [44] Learning dynamics of gradient descent optimization in deep neural networks
    Wei Wu
    Xiaoyuan Jing
    Wencai Du
    Guoliang Chen
    Science China Information Sciences, 2021, 64
  • [45] Interpreting and comparing neural activity across systems by geometric deep learning
    Gosztolai, Adam
    Vandergheynst, Pierre
    NATURE METHODS, 2025, 22 (03) : 467 - 468
  • [46] Glassy dynamics in composite biopolymer networks
    Golde, Tom
    Huster, Constantin
    Glaser, Martin
    Haendler, Tina
    Herrmann, Harald
    Kaes, Josef A.
    Schnauss, Joerg
    SOFT MATTER, 2018, 14 (39) : 7970 - 7978
  • [47] Connecting the structure of individual networks to the dynamics in neural systems
    Budzinski, Roberto
    Minac, Jan
    Muller, Lyle
    JOURNAL OF COMPUTATIONAL NEUROSCIENCE, 2024, 52 : S9 - S10
  • [48] Connecting the structure of individual networks to the dynamics in neural systems
    Budzinski, Roberto
    Minac, Jan
    Muller, Lyle
    JOURNAL OF COMPUTATIONAL NEUROSCIENCE, 2024, 52 : S9 - S10
  • [49] Comparing Humans and Deep Neural Networks on Visual Shape Judgments in Cluttered Images
    Funke, Christina
    Wallis, Thomas
    Borowski, Judith
    Michaelis, Claudio
    Ecker, Alexander
    Bethge, Matthias
    PERCEPTION, 2019, 48 : 199 - 199
  • [50] Systems with Slope Restricted Nonlinearities and Neural Networks Dynamics
    Danciu, Daniela
    Rasvan, Vladimir
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2011, PT II, 2011, 6692 : 565 - 572