Convergence of deep ReLU networks

被引:10
|
作者
Xu, Yuesheng [1 ]
Zhang, Haizhang [2 ]
机构
[1] Old Dominion Univ, Dept Math & Stat, Norfolk, VA 23529 USA
[2] Sun Yat sen Univ, Sch Math Zhuhai, Zhuhai, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金; 美国国家卫生研究院;
关键词
Deep learning; ReLU networks; Activation domains; Infinite product of matrices; ERROR-BOUNDS; WIDTH;
D O I
10.1016/j.neucom.2023.127174
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We explore convergence of deep neural networks with the popular ReLU activation function, as the depth of the networks tends to infinity. To this end, we introduce the notion of activation domains and activation matrices of a ReLU network. By replacing applications of the ReLU activation function by multiplications with activation matrices on activation domains, we obtain an explicit expression of the ReLU network. We then identify the convergence of the ReLU networks as convergence of a class of infinite products of matrices. Sufficient and necessary conditions for convergence of these infinite products of matrices are studied. As a result, we establish necessary conditions for ReLU networks to converge that the sequence of weight matrices converges to the identity matrix and the sequence of the bias vectors converges to zero as the depth of ReLU networks increases to infinity. Moreover, we obtain sufficient conditions in terms of the weight matrices and bias vectors at hidden layers for pointwise convergence of deep ReLU networks. These results provide mathematical insights to convergence of deep neural networks. Experiments are conducted to mathematically verify the results and to illustrate their potential usefulness in initialization of deep neural networks.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Convergence rates of deep ReLU networks for multiclass classification
    Bos, Thijs
    Schmidt-Hieber, Johannes
    ELECTRONIC JOURNAL OF STATISTICS, 2022, 16 (01): : 2724 - 2773
  • [2] Provable Accelerated Convergence of Nesterov's Momentum for Deep ReLU Neural Networks
    Liao, Fangshuo
    Kyrillidis, Anastasios
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237
  • [3] On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths
    Quynh Nguyen
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [4] Nonlinear Approximation and (Deep) ReLU Networks
    Daubechies, I.
    DeVore, R.
    Foucart, S.
    Hanin, B.
    Petrova, G.
    CONSTRUCTIVE APPROXIMATION, 2022, 55 (01) : 127 - 172
  • [5] Error bounds for approximations with deep ReLU networks
    Yarotsky, Dmitry
    NEURAL NETWORKS, 2017, 94 : 103 - 114
  • [6] Approximation in LP(μ) with deep ReLU neural networks
    Voigtlaender, Felix
    Petersen, Philipp
    2019 13TH INTERNATIONAL CONFERENCE ON SAMPLING THEORY AND APPLICATIONS (SAMPTA), 2019,
  • [7] Vanishing Curvature in Randomly Initialized Deep ReLU Networks
    Orvieto, Antonio
    Kohler, Jonas
    Pavllo, Dario
    Hofmann, Thomas
    Lucchi, Aurelien
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [8] Approximation of Nonlinear Functionals Using Deep ReLU Networks
    Song, Linhao
    Fan, Jun
    Chen, Di-Rong
    Zhou, Ding-Xuan
    JOURNAL OF FOURIER ANALYSIS AND APPLICATIONS, 2023, 29 (04)
  • [9] Approximation of Nonlinear Functionals Using Deep ReLU Networks
    Linhao Song
    Jun Fan
    Di-Rong Chen
    Ding-Xuan Zhou
    Journal of Fourier Analysis and Applications, 2023, 29
  • [10] A generative model for fBm with deep ReLU neural networks
    Allouche, Michaël
    Girard, Stéphane
    Gobet, Emmanuel
    Journal of Complexity, 2022, 73