A case where a spindly two-layer linear network decisively outperforms any neural network with a fully connected input layer

被引：0

作者：

Warmuth, Manfred K. ^{[1
]}

Kotlowski, Wojciech ^{[2
]}

Amid, Ehsan ^{[1
]}

机构：

[1] Google Res, Mountain View, CA 94043 USA

[2] Poznan Univ Tech, Poznan, Poland

来源：

ALGORITHMIC LEARNING THEORY, VOL 132 | 2021年 / 132卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

It was conjectured that any neural network of any structure and arbitrary differentiable transfer functions at the nodes cannot learn the following problem sample efficiently when trained with gradient descent: The instances are the rows of a d-dimensional Hadamard matrix and the target is one of the features, i.e. very sparse. We essentially prove this conjecture: We show that after receiving a random training set of size k < d, the expected squared loss is still 1 - (k)/(d - 1). The only requirement needed is that the input layer is fully connected and the initial weight vectors of the input nodes are chosen from a rotation invariant distribution. Surprisingly the same type of problem can be solved drastically more efficient by a simple 2-layer linear neural network in which the d inputs are connected to the output node by chains of length 2 (Now the input layer has only one edge per input). When such a network is trained by gradient descent, then it has been shown that its expected squared loss is log d/k. Our lower bounds essentially show that a sparse input layer is needed to sample efficiently learn sparse targets with gradient descent.

引用

页数：23

共 50 条

[1] A linear approach for sparse coding by a two-layer neural network
Montalto, Alessandro
Tessitore, Giovanni
Prevete, Roberto
[J]. NEUROCOMPUTING, 2015, 149 : 1315 - 1323
[2] Dynamics of the two-layer pseudoinverse neural network
黎树军
黄五群
陈天仑
[J]. Science Bulletin, 1995, (20) : 1691 - 1694
[3] Dynamics of the two-layer pseudoinverse neural network
Li, S.
Huang, W.
Chen, T.
[J]. Chinese Science Bulletin, 40 (20):
[4] Pyramidal neuron as two-layer neural network
Poirazi, P
Brannon, T
Mel, BW
[J]. NEURON, 2003, 37 (06) : 989 - 999
[5] Dynamics of the two-layer pseudoinverse neural network
Li, SJ
Huang, WQ
Chen, TL
[J]. CHINESE SCIENCE BULLETIN, 1995, 40 (20): : 1691 - 1694
[6] Propagating interfaces in a two-layer bistable neural network
Kazantsev, V. B.
Nekorkin, V. I.
Morfu, S.
Bilbault, J. M.
Marquié, P.
[J]. INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS, 2006, 16 (03): : 589 - 600
[7] Generalization in a two-layer neural network with multiple outputs
Kang, KJ
Oh, JH
Kwon, C
Park, Y
[J]. PHYSICAL REVIEW E, 1996, 54 (02): : 1811 - 1815
[8] Two-layer tree-connected feed-forward neural network model for neural cryptography
Lei, Xinyu
Liao, Xiaofeng
Chen, Fei
Huang, Tingwen
[J]. PHYSICAL REVIEW E, 2013, 87 (03):
[9] Compression of Fully-Connected Layer in Neural Network by Kronecker Product
Wu, Jia-Nan
[J]. 2016 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2016, : 173 - 179
[10] A Fully Connected Layer Elimination for a Binarized Convolutional Neural Network on an FPGA
Nakahara, Hiroki
Fujii, Tomoya
Sato, Shimpei
[J]. 2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,

← 1 2 3 4 5 →