Training Linear Neural Networks: Non-Local Convergence and Complexity Results

被引:0
|
作者
Eftekhari, Armin [1 ]
机构
[1] Umea Univ, Dept Math & Math Stat, Umea, Sweden
关键词
PRINCIPAL COMPONENTS; MATRIX; APPROXIMATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Linear networks provide valuable insights into the workings of neural networks in general. This paper identifies conditions under which the gradient flow provably trains a linear network, in spite of the non-strict saddle points present in the optimization landscape. This paper also provides the computational complexity of training linear networks with gradient flow. To achieve these results, this work develops a machinery to provably identify the stable set of gradient flow, which then enables us to improve over the state of the art in the literature of linear networks (Bah et al., 2019; Arora et al., 2018a). Crucially, our results appear to be the first to break away from the lazy training regime which has dominated the literature of neural networks. This work requires the network to have a layer with one neuron, which subsumes the networks with a scalar output, but extending the results of this theoretical work to all linear networks remains a challenging open problem.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Training Linear Neural Networks: Non-Local Convergence and Complexity Results
    Eftekhari, Armin
    [J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [2] On Non-local Convergence Analysis of Deep Linear Networks
    Chen, Kun
    Lin, Dachao
    Zhang, Zhihua
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [3] Non-local Neural Networks
    Wang, Xiaolong
    Girshick, Ross
    Gupta, Abhinav
    He, Kaiming
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7794 - 7803
  • [4] Non-Local Graph Neural Networks
    Liu, Meng
    Wang, Zhengyang
    Ji, Shuiwang
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 10270 - 10276
  • [5] Non-local correlations between separated neural networks
    Pizzi, R
    Fantasia, A
    Gelain, F
    Rossetti, D
    Vescovi, A
    [J]. QUANTUM INFORMATION AND COMPUTATION II, 2004, 5436 : 107 - 117
  • [6] Contextualized Non-Local Neural Networks for Sequence Learning
    Liu, Pengfei
    Chang, Shuaichen
    Huang, Xuanjing
    Tang, Jian
    Cheung, Jackie Chi Kit
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 6762 - 6769
  • [7] Asymmetric Non-local Neural Networks for Semantic Segmentation
    Zhu, Zhen
    Xu, Mengde
    Bai, Song
    Huang, Tengteng
    Bai, Xiang
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 593 - 602
  • [8] Detecting moonquakes using convolutional neural networks, a non-local training set, and transfer learning
    Civilini, F.
    Weber, R. C.
    Jiang, Z.
    Phillips, D.
    Pan, W. David
    [J]. GEOPHYSICAL JOURNAL INTERNATIONAL, 2021, 225 (03) : 2120 - 2134
  • [9] Non-local impact of link failures in linear flow networks
    Strake, Julius
    Kaiser, Franz
    Basiri, Farnaz
    Ronellenfitsch, Henrik
    Witthaut, Dirk
    [J]. NEW JOURNAL OF PHYSICS, 2019, 21 (05):
  • [10] Non-Local Parameterization of Atmospheric Subgrid Processes With Neural Networks
    Wang, Peidong
    Yuval, Janni
    O'Gorman, Paul A.
    [J]. JOURNAL OF ADVANCES IN MODELING EARTH SYSTEMS, 2022, 14 (10)