The loss surfaces of neural networks with general activation functions

被引:11
|
作者
Baskerville, Nicholas P. [1 ]
Keating, Jonathan P. [2 ]
Mezzadri, Francesco [1 ]
Najnudel, Joseph [1 ]
机构
[1] Univ Bristol, Sch Math, Fry Bldg, Bristol BS8 1UG, Avon, England
[2] Univ Oxford, Math Inst, Oxford OX2 6GG, England
基金
欧洲研究理事会;
关键词
deep learning; random matrix theory and extensions; machine learning; spin glasses; METASTABLE STATES; COMPLEXITY; LANDSCAPE; ASYMPTOTICS; MATRICES;
D O I
10.1088/1742-5468/abfa1e
中图分类号
O3 [力学];
学科分类号
08 ; 0801 ;
摘要
The loss surfaces of deep neural networks have been the subject of several studies, theoretical and experimental, over the last few years. One strand of work considers the complexity, in the sense of local optima, of high dimensional random functions with the aim of informing how local optimisation methods may perform in such complicated settings. Prior work of Choromanska et al (2015) established a direct link between the training loss surfaces of deep multi-layer perceptron networks and spherical multi-spin glass models under some very strong assumptions on the network and its data. In this work, we test the validity of this approach by removing the undesirable restriction to ReLU activation functions. In doing so, we chart a new path through the spin glass complexity calculations using supersymmetric methods in random matrix theory which may prove useful in other contexts. Our results shed new light on both the strengths and the weaknesses of spin glass models in this context.
引用
收藏
页数:71
相关论文
共 50 条
  • [1] On the performance of pairings of activation and loss functions in neural networks
    Soares, Rodrigo G. F.
    Pereira, Enieson J. S.
    [J]. 2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 326 - 333
  • [2] Approximation rates for neural networks with general activation functions
    Siegel, Jonathan W.
    Xu, Jinchao
    [J]. NEURAL NETWORKS, 2020, 128 : 313 - 321
  • [3] Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions
    Jagtap, Ameya D.
    Shin, Yeonjong
    Kawaguchi, Kenji
    Karniadakis, George Em
    [J]. NEUROCOMPUTING, 2022, 468 : 165 - 180
  • [4] On exponential stability of delayed neural networks with a general class of activation functions
    Sun, CY
    Zhang, KJ
    Fei, SM
    Feng, CB
    [J]. PHYSICS LETTERS A, 2002, 298 (2-3) : 122 - 132
  • [5] Absolute exponential stability of neural networks with a general class of activation functions
    Liang, XB
    Wang, J
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-FUNDAMENTAL THEORY AND APPLICATIONS, 2000, 47 (08): : 1258 - 1263
  • [6] Smooth Function Approximation by Deep Neural Networks with General Activation Functions
    Ohn, Ilsang
    Kim, Yongdai
    [J]. ENTROPY, 2019, 21 (07)
  • [7] New results on the general decay synchronization of delayed neural networks with general activation functions
    Abdurahman, Abdujelil
    [J]. NEUROCOMPUTING, 2018, 275 : 2505 - 2511
  • [8] GENERAL POTENTIAL SURFACES AND NEURAL NETWORKS
    DEMBO, A
    ZEITOUNI, O
    [J]. PHYSICAL REVIEW A, 1988, 37 (06) : 2134 - 2143
  • [9] Global robust stability of delayed neural networks with a class of general activation functions
    Huang, H
    Cao, JD
    Qu, YZ
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2004, 69 (04) : 688 - 700
  • [10] Wavelets as activation functions in Neural Networks
    Herrera, Oscar
    Priego, Belem
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4345 - 4355