Deep relaxation: partial differential equations for optimizing deep neural networks

被引：61

作者：

Chaudhari, Pratik ^{[1
]}

Oberman, Adam ^{[2
]}

Osher, Stanley ^{[3
,4
]}

Soatto, Stefano ^{[1
]}

Carlier, Guillaume ^{[5
]}

机构：

[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA

[2] McGill Univ, Dept Math & Stat, Montreal, PQ, Canada

[3] Univ Calif Los Angeles, Dept Math, Los Angeles, CA 90024 USA

[4] Univ Calif Los Angeles, Inst Pure & Appl Math, Los Angeles, CA USA

[5] Univ Paris IX Dauphine, CEREMADE, Paris, France

来源：

RESEARCH IN THE MATHEMATICAL SCIENCES | 2018年 / 5卷

关键词：

Deep learning; Partial differential equations; Stochastic gradient descent; Neural networks; Optimal control; Proximal;

D O I：

10.1007/s40687-018-0148-y

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Entropy-SGD is a first-order optimization method which has been used successfully to train deep neural networks. This algorithm, which was motivated by statistical physics, is now interpreted as gradient descent on a modified loss function. The modified, or relaxed, loss function is the solution of a viscous Hamilton-Jacobi partial differential equation (PDE). Experimental results on modern, high-dimensional neural networks demonstrate that the algorithm converges faster than the benchmark stochastic gradient descent (SGD). Well-established PDE regularity results allow us to analyze the geometry of the relaxed energy landscape, confirming empirical evidence. Stochastic homogenization theory allows us to better understand the convergence of the algorithm. A stochastic control interpretation is used to prove that a modified algorithm converges faster than SGD in expectation.

引用

页码：1 / 30

页数：30

共 50 条

[1] Deep relaxation: partial differential equations for optimizing deep neural networks
Pratik Chaudhari
Adam Oberman
Stanley Osher
Stefano Soatto
Guillaume Carlier
[J]. Research in the Mathematical Sciences, 2018, 5
[2] Partial Differential Equations for Training Deep Neural Networks
Chaudhari, Pratik
Oberman, Adam
Osher, Stanley
Soatto, Stefano
Carlier, Guillaume
[J]. 2017 FIFTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2017, : 1627 - 1631
[3] Deep Neural Networks Motivated by Partial Differential Equations
Lars Ruthotto
Eldad Haber
[J]. Journal of Mathematical Imaging and Vision, 2020, 62 : 352 - 364
[4] Deep Neural Networks Motivated by Partial Differential Equations
Ruthotto, Lars
Haber, Eldad
[J]. JOURNAL OF MATHEMATICAL IMAGING AND VISION, 2020, 62 (03) : 352 - 364
[5] Improved Deep Neural Networks with Domain Decomposition in Solving Partial Differential Equations
Wei Wu
Xinlong Feng
Hui Xu
[J]. Journal of Scientific Computing, 2022, 93
[6] Improved Deep Neural Networks with Domain Decomposition in Solving Partial Differential Equations
Wu, Wei
Feng, Xinlong
Xu, Hui
[J]. JOURNAL OF SCIENTIFIC COMPUTING, 2022, 93 (01)
[7] Solving Parametric Partial Differential Equations with Deep Rectified Quadratic Unit Neural Networks
Lei, Zhen
Shi, Lei
Zeng, Chenyu
[J]. JOURNAL OF SCIENTIFIC COMPUTING, 2022, 93 (03)
[8] Invariant deep neural networks under the finite group for solving partial differential equations
[J]. Zhang, Zhi-Yong (zzy@muc.edu.cn), 2025, 523
[9] Solving Parametric Partial Differential Equations with Deep Rectified Quadratic Unit Neural Networks
Zhen Lei
Lei Shi
Chenyu Zeng
[J]. Journal of Scientific Computing, 2022, 93
[10] Adaptive deep neural networks methods for high-dimensional partial differential equations
Zeng, Shaojie
Zhang, Zong
Zou, Qingsong
[J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2022, 463

← 1 2 3 4 5 →