On the Impact of the Activation Function on Deep Neural Networks Training

被引:0
|
作者
Hayou, Soufiane [1 ]
Doucet, Arnaud [1 ]
Rousseau, Judith [1 ]
机构
[1] Univ Oxford, Dept Stat, Oxford, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by (Schoenholz et al., 2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the 'Edge of Chaos' can lead to good performance. While the work by (Schoenholz et al., 2017) discuss trainability issues, we focus here on training acceleration and overall performance. We give a comprehensive theoretical analysis of the Edge of Chaos and show that we can indeed tune the initialization parameters and the activation function in order to accelerate the training and improve performance.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] A parameterized activation function for learning fuzzy logic operations in deep neural networks
    Godfrey, Luke B.
    Gashler, Michael S.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 740 - 745
  • [22] Periodic Function as Activation Function for Neural Networks
    Xu, Ding
    Guan, Yue
    Cai, Ping-ping
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE: TECHNIQUES AND APPLICATIONS, AITA 2016, 2016, : 179 - 183
  • [23] Is normalization indispensable for training deep neural networks?
    Shao, Jie
    Hu, Kai
    Wang, Changhu
    Xue, Xiangyang
    Raj, Bhiksha
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [24] Exploiting Invariance in Training Deep Neural Networks
    Ye, Chengxi
    Zhou, Xiong
    McKinney, Tristan
    Liu, Yanfeng
    Zhou, Qinggang
    Zhdanov, Fedor
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8849 - 8856
  • [25] On Calibration of Mixup Training for Deep Neural Networks
    Maronas, Juan
    Ramos, Daniel
    Paredes, Roberto
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2020, 2021, 12644 : 67 - 76
  • [26] Training Deep Neural Networks with Gradual Deconvexification
    Lo, Jawes Ting-Ho
    Gui, Yichuan
    Peng, Yun
    [J]. 2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1000 - 1007
  • [27] Training Deep Neural Networks for Visual Servoing
    Bateux, Quentin
    Marchand, Eric
    Leitner, Jurgen
    Chaumette, Francois
    Corke, Peter
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 3307 - 3314
  • [28] Local Critic Training of Deep Neural Networks
    Lee, Hojung
    Lee, Jong-Seok
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [29] An Optimization Strategy for Deep Neural Networks Training
    Wu, Tingting
    Zeng, Peng
    Song, Chunhe
    [J]. 2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 596 - 603
  • [30] Exploring Strategies for Training Deep Neural Networks
    Larochelle, Hugo
    Bengio, Yoshua
    Louradour, Jerome
    Lamblin, Pascal
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2009, 10 : 1 - 40