Deep relaxation: partial differential equations for optimizing deep neural networks

被引:61
|
作者
Chaudhari, Pratik [1 ]
Oberman, Adam [2 ]
Osher, Stanley [3 ,4 ]
Soatto, Stefano [1 ]
Carlier, Guillaume [5 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
[2] McGill Univ, Dept Math & Stat, Montreal, PQ, Canada
[3] Univ Calif Los Angeles, Dept Math, Los Angeles, CA 90024 USA
[4] Univ Calif Los Angeles, Inst Pure & Appl Math, Los Angeles, CA USA
[5] Univ Paris IX Dauphine, CEREMADE, Paris, France
关键词
Deep learning; Partial differential equations; Stochastic gradient descent; Neural networks; Optimal control; Proximal;
D O I
10.1007/s40687-018-0148-y
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Entropy-SGD is a first-order optimization method which has been used successfully to train deep neural networks. This algorithm, which was motivated by statistical physics, is now interpreted as gradient descent on a modified loss function. The modified, or relaxed, loss function is the solution of a viscous Hamilton-Jacobi partial differential equation (PDE). Experimental results on modern, high-dimensional neural networks demonstrate that the algorithm converges faster than the benchmark stochastic gradient descent (SGD). Well-established PDE regularity results allow us to analyze the geometry of the relaxed energy landscape, confirming empirical evidence. Stochastic homogenization theory allows us to better understand the convergence of the algorithm. A stochastic control interpretation is used to prove that a modified algorithm converges faster than SGD in expectation.
引用
收藏
页码:1 / 30
页数:30
相关论文
共 50 条
  • [1] Deep relaxation: partial differential equations for optimizing deep neural networks
    Pratik Chaudhari
    Adam Oberman
    Stanley Osher
    Stefano Soatto
    Guillaume Carlier
    [J]. Research in the Mathematical Sciences, 2018, 5
  • [2] Partial Differential Equations for Training Deep Neural Networks
    Chaudhari, Pratik
    Oberman, Adam
    Osher, Stanley
    Soatto, Stefano
    Carlier, Guillaume
    [J]. 2017 FIFTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2017, : 1627 - 1631
  • [3] Deep Neural Networks Motivated by Partial Differential Equations
    Lars Ruthotto
    Eldad Haber
    [J]. Journal of Mathematical Imaging and Vision, 2020, 62 : 352 - 364
  • [4] Deep Neural Networks Motivated by Partial Differential Equations
    Ruthotto, Lars
    Haber, Eldad
    [J]. JOURNAL OF MATHEMATICAL IMAGING AND VISION, 2020, 62 (03) : 352 - 364
  • [5] Improved Deep Neural Networks with Domain Decomposition in Solving Partial Differential Equations
    Wei Wu
    Xinlong Feng
    Hui Xu
    [J]. Journal of Scientific Computing, 2022, 93
  • [6] Improved Deep Neural Networks with Domain Decomposition in Solving Partial Differential Equations
    Wu, Wei
    Feng, Xinlong
    Xu, Hui
    [J]. JOURNAL OF SCIENTIFIC COMPUTING, 2022, 93 (01)
  • [7] Solving Parametric Partial Differential Equations with Deep Rectified Quadratic Unit Neural Networks
    Lei, Zhen
    Shi, Lei
    Zeng, Chenyu
    [J]. JOURNAL OF SCIENTIFIC COMPUTING, 2022, 93 (03)
  • [9] Solving Parametric Partial Differential Equations with Deep Rectified Quadratic Unit Neural Networks
    Zhen Lei
    Lei Shi
    Chenyu Zeng
    [J]. Journal of Scientific Computing, 2022, 93
  • [10] Adaptive deep neural networks methods for high-dimensional partial differential equations
    Zeng, Shaojie
    Zhang, Zong
    Zou, Qingsong
    [J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2022, 463