Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

被引:0
|
作者
Lu, Yiping [1 ]
Zhong, Aoxiao [2 ]
Li, Quanzheng [2 ,3 ,4 ]
Dong, Bin [4 ,5 ,6 ]
机构
[1] Peking Univ, Sch Math Sci, Beijing, Peoples R China
[2] Harvard Med Sch, Massachusetts Gen Hosp, MGH BWH Ctr Clin Data Sci, Boston, MA 02115 USA
[3] Peking Univ, Ctr Data Sci Hlth & Med, Beijing, Peoples R China
[4] Beijing Inst Big Data Res, Lab Biomed Image Anal, Beijing, Peoples R China
[5] Peking Univ, Beijing Int Ctr Math Res, Beijing, Peoples R China
[6] Peking Univ, Ctr Data Sci, Beijing, Peoples R China
基金
美国国家卫生研究院;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep neural networks have become the stateof-the-art models in numerous machine learning tasks. However, general guidance to network architecture design is still missing. In our work, we bridge deep neural network design with numerical differential equations. We show that many effective networks, such as ResNet, PolyNet, FractalNet and RevNet, can be interpreted as different numerical discretizations of differential equations. This finding brings us a brand new perspective on the design of effective deep architectures. We can take advantage of the rich knowledge in numerical analysis to guide us in designing new and potentially more effective deep networks. As an example, we propose a linear multi-step architecture (LM-architecture) which is inspired by the linear multi-step method solving ordinary differential equations. The LM-architecture is an effective structure that can be used on any ResNet-like networks. In particular, we demonstrate that LM-ResNet and LM-ResNeXt (i.e. the networks obtained by applying the LM-architecture on ResNet and ResNeXt respectively) can achieve noticeably higher accuracy than ResNet and ResNeXt on both CI-FAR and ImageNet with comparable numbers of trainable parameters. In particular, on both CI-FAR and ImageNet, LM-ResNet/LM-ResNeXt can significantly compress the original networks while maintaining a similar performance. This can be explained mathematically using the concept of modified equation from numerical analysis. Last but not least, we also establish a connection between stochastic control and noise injection in the training process which helps to improve generalization of the networks. Furthermore, by relating stochastic training strategy with stochastic dynamic system, we can easily apply stochastic training to the networks with the LM-architecture. As an example, we introduced stochastic depth to LM-ResNet and achieve significant improvement over the original LM-ResNet on CIFAR10.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Bridging pharmacology and neural networks: A deep dive into neural ordinary differential equations
    Losada, Idris Bachali
    Terranova, Nadia
    CPT-PHARMACOMETRICS & SYSTEMS PHARMACOLOGY, 2024, 13 (08): : 1289 - 1296
  • [2] Invariant deep neural networks under the finite group for solving partial differential equations
    Zhang, Zhi-Yong
    Li, Jie-Ying
    Guo, Lei-Lei
    JOURNAL OF COMPUTATIONAL PHYSICS, 2025, 523
  • [3] Solving differential equations using deep neural networks
    Michoski, Craig
    Milosavljevic, Milos
    Oliver, Todd
    Hatch, David R.
    NEUROCOMPUTING, 2020, 399 : 193 - 212
  • [4] Deep Neural Networks Motivated by Partial Differential Equations
    Lars Ruthotto
    Eldad Haber
    Journal of Mathematical Imaging and Vision, 2020, 62 : 352 - 364
  • [5] Partial Differential Equations for Training Deep Neural Networks
    Chaudhari, Pratik
    Oberman, Adam
    Osher, Stanley
    Soatto, Stefano
    Carlier, Guillaume
    2017 FIFTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2017, : 1627 - 1631
  • [6] Deep Neural Networks Motivated by Partial Differential Equations
    Ruthotto, Lars
    Haber, Eldad
    JOURNAL OF MATHEMATICAL IMAGING AND VISION, 2020, 62 (03) : 352 - 364
  • [7] Deep relaxation: partial differential equations for optimizing deep neural networks
    Chaudhari, Pratik
    Oberman, Adam
    Osher, Stanley
    Soatto, Stefano
    Carlier, Guillaume
    RESEARCH IN THE MATHEMATICAL SCIENCES, 2018, 5 : 1 - 30
  • [8] Deep relaxation: partial differential equations for optimizing deep neural networks
    Pratik Chaudhari
    Adam Oberman
    Stanley Osher
    Stefano Soatto
    Guillaume Carlier
    Research in the Mathematical Sciences, 2018, 5
  • [9] Finite-element neural networks for solving differential equations
    Ramuhalli, P
    Udpa, L
    Udpa, SS
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2005, 16 (06): : 1381 - 1392
  • [10] Numerical solutions of wavelet neural networks for fractional differential equations
    Wu, Mingqiu
    Zhang, Jinlei
    Huang, Zhijie
    Li, Xiang
    Dong, Yumin
    MATHEMATICAL METHODS IN THE APPLIED SCIENCES, 2023, 46 (03) : 3031 - 3044