Deep relaxation: partial differential equations for optimizing deep neural networks

被引:61
|
作者
Chaudhari, Pratik [1 ]
Oberman, Adam [2 ]
Osher, Stanley [3 ,4 ]
Soatto, Stefano [1 ]
Carlier, Guillaume [5 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
[2] McGill Univ, Dept Math & Stat, Montreal, PQ, Canada
[3] Univ Calif Los Angeles, Dept Math, Los Angeles, CA 90024 USA
[4] Univ Calif Los Angeles, Inst Pure & Appl Math, Los Angeles, CA USA
[5] Univ Paris IX Dauphine, CEREMADE, Paris, France
关键词
Deep learning; Partial differential equations; Stochastic gradient descent; Neural networks; Optimal control; Proximal;
D O I
10.1007/s40687-018-0148-y
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Entropy-SGD is a first-order optimization method which has been used successfully to train deep neural networks. This algorithm, which was motivated by statistical physics, is now interpreted as gradient descent on a modified loss function. The modified, or relaxed, loss function is the solution of a viscous Hamilton-Jacobi partial differential equation (PDE). Experimental results on modern, high-dimensional neural networks demonstrate that the algorithm converges faster than the benchmark stochastic gradient descent (SGD). Well-established PDE regularity results allow us to analyze the geometry of the relaxed energy landscape, confirming empirical evidence. Stochastic homogenization theory allows us to better understand the convergence of the algorithm. A stochastic control interpretation is used to prove that a modified algorithm converges faster than SGD in expectation.
引用
收藏
页码:1 / 30
页数:30
相关论文
共 50 条
  • [21] Generalization bounds for neural ordinary differential equations and deep residual networks
    Marion, Pierre
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [22] Deep neural network modeling of unknown partial differential equations in nodal space
    Chen, Zhen
    Churchill, Victor
    Wu, Kailiang
    Xiu, Dongbin
    [J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2022, 449
  • [23] An improved data-free surrogate model for solving partial differential equations using deep neural networks
    Xinhai Chen
    Rongliang Chen
    Qian Wan
    Rui Xu
    Jie Liu
    [J]. Scientific Reports, 11
  • [24] An improved data-free surrogate model for solving partial differential equations using deep neural networks
    Chen, Xinhai
    Chen, Rongliang
    Wan, Qian
    Xu, Rui
    Liu, Jie
    [J]. SCIENTIFIC REPORTS, 2021, 11 (01)
  • [25] Discrete gradient flow approximations of high dimensional evolution partial differential equations via deep neural networks
    Georgoulis, Emmanuil H.
    Loulakis, Michail
    Tsiourvas, Asterios
    [J]. COMMUNICATIONS IN NONLINEAR SCIENCE AND NUMERICAL SIMULATION, 2023, 117
  • [26] Deep ReLU neural networks overcome the curse of dimensionality for partial integrodifferential equations
    Gonon, Lukas
    Schwab, Christoph
    [J]. ANALYSIS AND APPLICATIONS, 2023, 21 (01) : 1 - 47
  • [27] Transferable Neural Networks for Partial Differential Equations
    Zezhong Zhang
    Feng Bao
    Lili Ju
    Guannan Zhang
    [J]. Journal of Scientific Computing, 2024, 99
  • [28] Transferable Neural Networks for Partial Differential Equations
    Zhang, Zezhong
    Bao, Feng
    Ju, Lili
    Zhang, Guannan
    [J]. JOURNAL OF SCIENTIFIC COMPUTING, 2024, 99 (01)
  • [29] Simulating Partial Differential Equations with Neural Networks
    Chertock, Anna
    Leonard, Christopher
    [J]. HYPERBOLIC PROBLEMS: THEORY, NUMERICS, APPLICATIONS, VOL II, HYP2022, 2024, 35 : 39 - 49
  • [30] Deep neural networks based temporal-difference methods for high-dimensional parabolic partial differential equations
    Zeng, Shaojie
    Cai, Yihua
    Zou, Qingsong
    [J]. Journal of Computational Physics, 2022, 468