On the Impact of the Activation Function on Deep Neural Networks Training

被引:0
|
作者
Hayou, Soufiane [1 ]
Doucet, Arnaud [1 ]
Rousseau, Judith [1 ]
机构
[1] Univ Oxford, Dept Stat, Oxford, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by (Schoenholz et al., 2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the 'Edge of Chaos' can lead to good performance. While the work by (Schoenholz et al., 2017) discuss trainability issues, we focus here on training acceleration and overall performance. We give a comprehensive theoretical analysis of the Edge of Chaos and show that we can indeed tune the initialization parameters and the activation function in order to accelerate the training and improve performance.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] The Impact of Architecture on the Deep Neural Networks Training
    Rozycki, Pawel
    Kolbusz, Janusz
    Malinowski, Aleksander
    Wilamowski, Bogdan
    [J]. 2019 12TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION (HSI), 2019, : 41 - 46
  • [2] RSigELU: A nonlinear activation function for deep neural networks
    Kilicarslan, Serhat
    Celik, Mete
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 174
  • [3] The impact of activation functions on training and performance of a deep neural network
    Marcu, David C.
    Grava, Cristian
    [J]. 2021 16TH INTERNATIONAL CONFERENCE ON ENGINEERING OF MODERN ELECTRIC SYSTEMS (EMES), 2021, : 126 - 129
  • [4] An Efficient Asymmetric Nonlinear Activation Function for Deep Neural Networks
    Chai, Enhui
    Yu, Wei
    Cui, Tianxiang
    Ren, Jianfeng
    Ding, Shusheng
    [J]. SYMMETRY-BASEL, 2022, 14 (05):
  • [5] Regularized Flexible Activation Function Combination for Deep Neural Networks
    Jie, Renlong
    Gao, Junbin
    Vasnev, Andrey
    Tran, Minh-ngoc
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2001 - 2008
  • [6] NIPUNA: A Novel Optimizer Activation Function for Deep Neural Networks
    Madhu, Golla
    Kautish, Sandeep
    Alnowibet, Khalid Abdulaziz
    Zawbaa, Hossam M. M.
    Mohamed, Ali Wagdy
    [J]. AXIOMS, 2023, 12 (03)
  • [7] Serf: Towards better training of deep neural networks using log-Softplus ERror activation Function
    Nag, Sayan
    Bhattacharyya, Mayukh
    Mukherjee, Anuraag
    Kundu, Rohit
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5313 - 5322
  • [8] Approximating smooth functions by deep neural networks with sigmoid activation function
    Langer, Sophie
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2021, 182
  • [9] Smooth Function Approximation by Deep Neural Networks with General Activation Functions
    Ohn, Ilsang
    Kim, Yongdai
    [J]. ENTROPY, 2019, 21 (07)
  • [10] Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks
    Rhu, Minsoo
    O'Connor, Mike
    Chatterjee, Niladrish
    Pool, Jeff
    Kwon, Youngeun
    Keckler, Stephen W.
    [J]. 2018 24TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2018, : 78 - 91