Deep relaxation: partial differential equations for optimizing deep neural networks

被引：61

作者：

Chaudhari, Pratik ^{[1
]}

Oberman, Adam ^{[2
]}

Osher, Stanley ^{[3
,4
]}

Soatto, Stefano ^{[1
]}

Carlier, Guillaume ^{[5
]}

机构：

[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA

[2] McGill Univ, Dept Math & Stat, Montreal, PQ, Canada

[3] Univ Calif Los Angeles, Dept Math, Los Angeles, CA 90024 USA

[4] Univ Calif Los Angeles, Inst Pure & Appl Math, Los Angeles, CA USA

[5] Univ Paris IX Dauphine, CEREMADE, Paris, France

来源：

RESEARCH IN THE MATHEMATICAL SCIENCES | 2018年 / 5卷

关键词：

Deep learning; Partial differential equations; Stochastic gradient descent; Neural networks; Optimal control; Proximal;

D O I：

10.1007/s40687-018-0148-y

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Entropy-SGD is a first-order optimization method which has been used successfully to train deep neural networks. This algorithm, which was motivated by statistical physics, is now interpreted as gradient descent on a modified loss function. The modified, or relaxed, loss function is the solution of a viscous Hamilton-Jacobi partial differential equation (PDE). Experimental results on modern, high-dimensional neural networks demonstrate that the algorithm converges faster than the benchmark stochastic gradient descent (SGD). Well-established PDE regularity results allow us to analyze the geometry of the relaxed energy landscape, confirming empirical evidence. Stochastic homogenization theory allows us to better understand the convergence of the algorithm. A stochastic control interpretation is used to prove that a modified algorithm converges faster than SGD in expectation.

引用

页码：1 / 30

页数：30

共 50 条

[21] Generalization bounds for neural ordinary differential equations and deep residual networks
Marion, Pierre
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[22] Deep neural network modeling of unknown partial differential equations in nodal space
Chen, Zhen
Churchill, Victor
Wu, Kailiang
Xiu, Dongbin
[J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2022, 449
[23] An improved data-free surrogate model for solving partial differential equations using deep neural networks
Xinhai Chen
Rongliang Chen
Qian Wan
Rui Xu
Jie Liu
[J]. Scientific Reports, 11
[24] An improved data-free surrogate model for solving partial differential equations using deep neural networks
Chen, Xinhai
Chen, Rongliang
Wan, Qian
Xu, Rui
Liu, Jie
[J]. SCIENTIFIC REPORTS, 2021, 11 (01)
[25] Discrete gradient flow approximations of high dimensional evolution partial differential equations via deep neural networks
Georgoulis, Emmanuil H.
Loulakis, Michail
Tsiourvas, Asterios
[J]. COMMUNICATIONS IN NONLINEAR SCIENCE AND NUMERICAL SIMULATION, 2023, 117
[26] Deep ReLU neural networks overcome the curse of dimensionality for partial integrodifferential equations
Gonon, Lukas
Schwab, Christoph
[J]. ANALYSIS AND APPLICATIONS, 2023, 21 (01) : 1 - 47
[27] Transferable Neural Networks for Partial Differential Equations
Zezhong Zhang
Feng Bao
Lili Ju
Guannan Zhang
[J]. Journal of Scientific Computing, 2024, 99
[28] Transferable Neural Networks for Partial Differential Equations
Zhang, Zezhong
Bao, Feng
Ju, Lili
Zhang, Guannan
[J]. JOURNAL OF SCIENTIFIC COMPUTING, 2024, 99 (01)
[29] Simulating Partial Differential Equations with Neural Networks
Chertock, Anna
Leonard, Christopher
[J]. HYPERBOLIC PROBLEMS: THEORY, NUMERICS, APPLICATIONS, VOL II, HYP2022, 2024, 35 : 39 - 49
[30] Deep neural networks based temporal-difference methods for high-dimensional parabolic partial differential equations
Zeng, Shaojie
Cai, Yihua
Zou, Qingsong
[J]. Journal of Computational Physics, 2022, 468

← 1 2 3 4 5 →