Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions

被引:90
|
作者
Jagtap, Ameya D. [1 ]
Shin, Yeonjong [1 ]
Kawaguchi, Kenji [2 ]
Karniadakis, George Em [1 ,3 ]
机构
[1] Brown Univ, Div Appl Math, 182 George St, Providence, RI 02912 USA
[2] Harvard Univ, Ctr Math Sci & Applicat, Cambridge, MA 02138 USA
[3] Brown Univ, Sch Engn, Providence, RI 02912 USA
关键词
Deep neural networks; Kronecker product; Rowdy activation functions; Gradient flow dynamics; physics-informed neural networks; Deep learning benchmarks; LEARNING FRAMEWORK;
D O I
10.1016/j.neucom.2021.10.036
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions. KNNs employ the Kronecker product, which provides an efficient way of constructing a very wide network while keeping the number of parameters low. Our theoretical analysis reveals that under suitable conditions, KNNs induce a faster decay of the loss than that by the feed-forward networks. This is also empirically verified through a set of computational examples. Furthermore, under certain technical assumptions, we establish global convergence of gradient descent for KNNs. As a specific case, we propose the Rowdy activation function that is designed to get rid of any saturation region by injecting sinusoidal fluctuations, which include trainable parameters. The proposed Rowdy activation function can be employed in any neural network architecture like feed-forward neural networks, Recurrent neural networks, Convolutional neural networks etc. The effectiveness of KNNs with Rowdy activation is demonstrated through various computational experiments including function approximation using feed-forward neural networks, solution inference of partial differential equations using the physics-informed neural networks, and standard deep learning benchmark problems using convolutional and fully-connected neural networks. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:165 / 180
页数:16
相关论文
共 50 条
  • [31] Adaptive Weight Decay for Deep Neural Networks
    Nakamura, Kensuke
    Hong, Byung-Woo
    [J]. IEEE ACCESS, 2019, 7 : 118857 - 118865
  • [32] Adaptive propagation deep graph neural networks
    Chen, Wei
    Yan, Wenxu
    Wang, Wenyuan
    [J]. PATTERN RECOGNITION, 2024, 154
  • [33] On the approximation of rough functions with deep neural networks
    De Ryck T.
    Mishra S.
    Ray D.
    [J]. SeMA Journal, 2022, 79 (3) : 399 - 440
  • [34] Deep Convolutional Neural Networks on Cartoon Functions
    Grohs, Philipp
    Wiatowski, Thomas
    Bolcskei, Helmut
    [J]. 2016 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, 2016, : 1163 - 1167
  • [35] Adaptive Morphing Activation Function for Neural Networks
    Herrera-Alcantara, Oscar
    Arellano-Balderas, Salvador
    [J]. FRACTAL AND FRACTIONAL, 2024, 8 (08)
  • [36] Neural networks with adaptive spline activation function
    Campolucci, P
    Capparelli, F
    Guarnieri, S
    Piazza, F
    Uncini, A
    [J]. MELECON '96 - 8TH MEDITERRANEAN ELECTROTECHNICAL CONFERENCE, PROCEEDINGS, VOLS I-III: INDUSTRIAL APPLICATIONS IN POWER SYSTEMS, COMPUTER SCIENCE AND TELECOMMUNICATIONS, 1996, : 1442 - 1445
  • [37] Quantum activation functions for quantum neural networks
    Marco Maronese
    Claudio Destri
    Enrico Prati
    [J]. Quantum Information Processing, 21
  • [38] A Comparison of Activation Functions in Artificial Neural Networks
    Bircanoglu, Cenk
    Arica, Nafiz
    [J]. 2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [39] HOLDER CONTINUOUS ACTIVATION FUNCTIONS IN NEURAL NETWORKS
    Tatar, Nasser-Eddine
    [J]. ADVANCES IN DIFFERENTIAL EQUATIONS AND CONTROL PROCESSES, 2015, 15 (02): : 93 - 106
  • [40] General adaptive transfer functions design for volume rendering by using neural networks
    Wang, Liansheng
    Chen, Xucan
    Li, Sikun
    Cai, Xun
    [J]. NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 659 - 670