Comparing dynamics: deep neural networks versus glassy systems

被引:23
|
作者
Baity-Jesi, Marco [1 ,2 ]
Sagun, Levent [3 ,4 ]
Geiger, Mario [4 ]
Spigler, Stefano [3 ,4 ]
Ben Arpus, Gerard [5 ]
Cammarpta, Chiara [6 ]
LeCun, Yann [5 ,7 ,8 ]
Wyart, Matthieu [4 ]
Biroli, Giulio [3 ,9 ]
机构
[1] Eawag, Dept Syst Anal Integrated Assessment & Modelling, Swiss Fed Inst Aquat Sci & Technol, CH-8600 Dubendorf, Switzerland
[2] Columbia Univ, Dept Chem, New York, NY 10027 USA
[3] Univ Paris Saclay, Inst Phys Theor, CEA, CNRS, F-91191 Gif Sur Yvette, France
[4] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[5] NYU, Courant Inst Math Sci, New York, NY USA
[6] Kings Coll London, Dept Math, London WC2R 2LS, England
[7] NYU, Ctr Data Sci, New York, NY USA
[8] Facebook Inc, Facebook AI Res, New York, NY USA
[9] Sorbonne Univ, PSL Res Univ, CNRS, Lab Phys Stat,Ecole Normale Super, F-75005 Paris, France
基金
瑞士国家科学基金会;
关键词
machine learning;
D O I
10.1088/1742-5468/ab3281
中图分类号
O3 [力学];
学科分类号
08 ; 0801 ;
摘要
We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large times, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, showing that the statistical properties of the corresponding loss and energy landscapes are different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Comparing Dynamics: Deep Neural Networks versus Glassy Systems
    Baity-Jesi, Marco
    Sagun, Levent
    Geiger, Mario
    Spigler, Stefano
    Ben Arous, Gerard
    Cammarota, Chiara
    LeCun, Yann
    Wyart, Matthieu
    Biroli, Giulio
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [2] Comparing Speed Reduction of Adversarial Defense Systems on Deep Neural Networks
    Bowman, Andrew
    Yang, Xin
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGING SYSTEMS AND TECHNIQUES (IST), 2021,
  • [3] Selection dynamics for deep neural networks
    Liu, Hailiang
    Markowich, Peter
    JOURNAL OF DIFFERENTIAL EQUATIONS, 2020, 269 (12) : 11540 - 11574
  • [4] Deep neural networks and molecular dynamics
    Car, Roberto
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 256
  • [5] Dynamics of Deep Neural Networks and Neural Tangent Hierarchy
    Huang, Jiaoyang
    Yau, Horng-Tzer
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [6] Comparing Deep and Dendrite Neural Networks: A Case Study
    Hernandez, Gerardo
    Zamora, Erik
    Sossa, Humberto
    PATTERN RECOGNITION (MCPR 2017), 2017, 10267 : 32 - 41
  • [7] Comparing the Visual Representations and Performance of Humans and Deep Neural Networks
    Jacobs, Robert A.
    Bates, Christopher J.
    CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE, 2019, 28 (01) : 34 - 39
  • [8] Understanding and Comparing Deep Neural Networks for Age and Gender Classification
    Lapuschkin, Sebastian
    Binder, Alexander
    Mueller, Klaus-Robert
    Samek, Wojciech
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 1629 - 1638
  • [9] Contextual modulation of affect: Comparing humans and deep neural networks
    Shin, Soomin
    Kim, Doo Yon
    Wallraven, Christian
    COMPANION PUBLICATION OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 127 - 133