A case where a spindly two-layer linear network decisively outperforms any neural network with a fully connected input layer

被引:0
|
作者
Warmuth, Manfred K. [1 ]
Kotlowski, Wojciech [2 ]
Amid, Ehsan [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] Poznan Univ Tech, Poznan, Poland
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It was conjectured that any neural network of any structure and arbitrary differentiable transfer functions at the nodes cannot learn the following problem sample efficiently when trained with gradient descent: The instances are the rows of a d-dimensional Hadamard matrix and the target is one of the features, i.e. very sparse. We essentially prove this conjecture: We show that after receiving a random training set of size k < d, the expected squared loss is still 1 - (k)/(d - 1). The only requirement needed is that the input layer is fully connected and the initial weight vectors of the input nodes are chosen from a rotation invariant distribution. Surprisingly the same type of problem can be solved drastically more efficient by a simple 2-layer linear neural network in which the d inputs are connected to the output node by chains of length 2 (Now the input layer has only one edge per input). When such a network is trained by gradient descent, then it has been shown that its expected squared loss is log d/k. Our lower bounds essentially show that a sparse input layer is needed to sample efficiently learn sparse targets with gradient descent.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] A linear approach for sparse coding by a two-layer neural network
    Montalto, Alessandro
    Tessitore, Giovanni
    Prevete, Roberto
    [J]. NEUROCOMPUTING, 2015, 149 : 1315 - 1323
  • [2] Dynamics of the two-layer pseudoinverse neural network
    黎树军
    黄五群
    陈天仑
    [J]. Science Bulletin, 1995, (20) : 1691 - 1694
  • [3] Dynamics of the two-layer pseudoinverse neural network
    Li, S.
    Huang, W.
    Chen, T.
    [J]. Chinese Science Bulletin, 40 (20):
  • [4] Pyramidal neuron as two-layer neural network
    Poirazi, P
    Brannon, T
    Mel, BW
    [J]. NEURON, 2003, 37 (06) : 989 - 999
  • [5] Dynamics of the two-layer pseudoinverse neural network
    Li, SJ
    Huang, WQ
    Chen, TL
    [J]. CHINESE SCIENCE BULLETIN, 1995, 40 (20): : 1691 - 1694
  • [6] Propagating interfaces in a two-layer bistable neural network
    Kazantsev, V. B.
    Nekorkin, V. I.
    Morfu, S.
    Bilbault, J. M.
    Marquié, P.
    [J]. INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS, 2006, 16 (03): : 589 - 600
  • [7] Generalization in a two-layer neural network with multiple outputs
    Kang, KJ
    Oh, JH
    Kwon, C
    Park, Y
    [J]. PHYSICAL REVIEW E, 1996, 54 (02): : 1811 - 1815
  • [8] Two-layer tree-connected feed-forward neural network model for neural cryptography
    Lei, Xinyu
    Liao, Xiaofeng
    Chen, Fei
    Huang, Tingwen
    [J]. PHYSICAL REVIEW E, 2013, 87 (03):
  • [9] Compression of Fully-Connected Layer in Neural Network by Kronecker Product
    Wu, Jia-Nan
    [J]. 2016 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2016, : 173 - 179
  • [10] A Fully Connected Layer Elimination for a Binarized Convolutional Neural Network on an FPGA
    Nakahara, Hiroki
    Fujii, Tomoya
    Sato, Shimpei
    [J]. 2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,