Deep relaxation: partial differential equations for optimizing deep neural networks

被引:61
|
作者
Chaudhari, Pratik [1 ]
Oberman, Adam [2 ]
Osher, Stanley [3 ,4 ]
Soatto, Stefano [1 ]
Carlier, Guillaume [5 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
[2] McGill Univ, Dept Math & Stat, Montreal, PQ, Canada
[3] Univ Calif Los Angeles, Dept Math, Los Angeles, CA 90024 USA
[4] Univ Calif Los Angeles, Inst Pure & Appl Math, Los Angeles, CA USA
[5] Univ Paris IX Dauphine, CEREMADE, Paris, France
关键词
Deep learning; Partial differential equations; Stochastic gradient descent; Neural networks; Optimal control; Proximal;
D O I
10.1007/s40687-018-0148-y
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Entropy-SGD is a first-order optimization method which has been used successfully to train deep neural networks. This algorithm, which was motivated by statistical physics, is now interpreted as gradient descent on a modified loss function. The modified, or relaxed, loss function is the solution of a viscous Hamilton-Jacobi partial differential equation (PDE). Experimental results on modern, high-dimensional neural networks demonstrate that the algorithm converges faster than the benchmark stochastic gradient descent (SGD). Well-established PDE regularity results allow us to analyze the geometry of the relaxed energy landscape, confirming empirical evidence. Stochastic homogenization theory allows us to better understand the convergence of the algorithm. A stochastic control interpretation is used to prove that a modified algorithm converges faster than SGD in expectation.
引用
收藏
页码:1 / 30
页数:30
相关论文
共 50 条
  • [41] Optimizing Accelerator on FPGA for Deep Convolutional Neural Networks
    Dong, Yong
    Hu, Wei
    Wang, Yonghao
    Jiao, Qiang
    Chen, Shuang
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2020, PT II, 2020, 12453 : 97 - 110
  • [42] Optimizing for interpretability in deep neural networks with tree regularization
    Wu, Mike
    Parbhoo, Sonali
    Hughes, Michael C.
    Roth, Volker
    Doshi-Velez, Finale
    [J]. Journal of Artificial Intelligence Research, 2021, 72
  • [43] Optimizing for Interpretability in Deep Neural Networks with Tree Regularization
    Wu, Mike
    Parbhoo, Sonali
    Hughes, Michael C.
    Roth, Volker
    Doshi-Velez, Finale
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2021, 72 : 1 - 37
  • [44] Connections between deep learning and partial differential equations
    Burger, M.
    E, W.
    Ruthotto, L.
    Osher, S. J. .
    [J]. EUROPEAN JOURNAL OF APPLIED MATHEMATICS, 2021, 32 (03) : 395 - 396
  • [45] Deep finite volume method for partial differential equations
    Cen, Jianhuan
    Zou, Qingsong
    [J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2024, 517
  • [46] ReluDiff: Differential Verification of Deep Neural Networks
    Paulsen, Brandon
    Wang, Jingbo
    Wang, Chao
    [J]. 2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, : 714 - 726
  • [47] Solving Partial Differential Equations with Bernstein Neural Networks
    Razvarz, Sina
    Jafari, Raheleh
    Gegov, Alexander
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS (UKCI), 2019, 840 : 57 - 70
  • [48] Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations
    Lu, Yiping
    Zhong, Aoxiao
    Li, Quanzheng
    Dong, Bin
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [49] Simulator-free solution of high-dimensional stochastic elliptic partial differential equations using deep neural networks
    Karumuri, Sharmila
    Tripathy, Rohit
    Bilionis, Ilias
    Panchal, Jitesh
    [J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2020, 404
  • [50] Is the neural tangent kernel of PINNs deep learning general partial differential equations always convergent?
    Zhou, Zijian
    Yan, Zhenya
    [J]. PHYSICA D-NONLINEAR PHENOMENA, 2024, 457