Deep relaxation: partial differential equations for optimizing deep neural networks

被引：61

作者：

Chaudhari, Pratik ^{[1
]}

Oberman, Adam ^{[2
]}

Osher, Stanley ^{[3
,4
]}

Soatto, Stefano ^{[1
]}

Carlier, Guillaume ^{[5
]}

机构：

[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA

[2] McGill Univ, Dept Math & Stat, Montreal, PQ, Canada

[3] Univ Calif Los Angeles, Dept Math, Los Angeles, CA 90024 USA

[4] Univ Calif Los Angeles, Inst Pure & Appl Math, Los Angeles, CA USA

[5] Univ Paris IX Dauphine, CEREMADE, Paris, France

来源：

RESEARCH IN THE MATHEMATICAL SCIENCES | 2018年 / 5卷

关键词：

Deep learning; Partial differential equations; Stochastic gradient descent; Neural networks; Optimal control; Proximal;

D O I：

10.1007/s40687-018-0148-y

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Entropy-SGD is a first-order optimization method which has been used successfully to train deep neural networks. This algorithm, which was motivated by statistical physics, is now interpreted as gradient descent on a modified loss function. The modified, or relaxed, loss function is the solution of a viscous Hamilton-Jacobi partial differential equation (PDE). Experimental results on modern, high-dimensional neural networks demonstrate that the algorithm converges faster than the benchmark stochastic gradient descent (SGD). Well-established PDE regularity results allow us to analyze the geometry of the relaxed energy landscape, confirming empirical evidence. Stochastic homogenization theory allows us to better understand the convergence of the algorithm. A stochastic control interpretation is used to prove that a modified algorithm converges faster than SGD in expectation.

引用

页码：1 / 30

页数：30

共 50 条

[41] Optimizing Accelerator on FPGA for Deep Convolutional Neural Networks
Dong, Yong
Hu, Wei
Wang, Yonghao
Jiao, Qiang
Chen, Shuang
[J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2020, PT II, 2020, 12453 : 97 - 110
[42] Optimizing for interpretability in deep neural networks with tree regularization
Wu, Mike
Parbhoo, Sonali
Hughes, Michael C.
Roth, Volker
Doshi-Velez, Finale
[J]. Journal of Artificial Intelligence Research, 2021, 72
[43] Optimizing for Interpretability in Deep Neural Networks with Tree Regularization
Wu, Mike
Parbhoo, Sonali
Hughes, Michael C.
Roth, Volker
Doshi-Velez, Finale
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2021, 72 : 1 - 37
[44] Connections between deep learning and partial differential equations
Burger, M.
E, W.
Ruthotto, L.
Osher, S. J. .
[J]. EUROPEAN JOURNAL OF APPLIED MATHEMATICS, 2021, 32 (03) : 395 - 396
[45] Deep finite volume method for partial differential equations
Cen, Jianhuan
Zou, Qingsong
[J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2024, 517
[46] ReluDiff: Differential Verification of Deep Neural Networks
Paulsen, Brandon
Wang, Jingbo
Wang, Chao
[J]. 2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, : 714 - 726
[47] Solving Partial Differential Equations with Bernstein Neural Networks
Razvarz, Sina
Jafari, Raheleh
Gegov, Alexander
[J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS (UKCI), 2019, 840 : 57 - 70
[48] Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations
Lu, Yiping
Zhong, Aoxiao
Li, Quanzheng
Dong, Bin
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[49] Simulator-free solution of high-dimensional stochastic elliptic partial differential equations using deep neural networks
Karumuri, Sharmila
Tripathy, Rohit
Bilionis, Ilias
Panchal, Jitesh
[J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2020, 404
[50] Is the neural tangent kernel of PINNs deep learning general partial differential equations always convergent?
Zhou, Zijian
Yan, Zhenya
[J]. PHYSICA D-NONLINEAR PHENOMENA, 2024, 457

← 1 2 3 4 5 →