Sharp asymptotics on the compression of two-layer neural networks

被引:0
|
作者
Amani, Mohammad Hossein [1 ]
Bombari, Simone [2 ]
Mondelli, Marco [2 ]
Pukdee, Rattana [3 ]
Rini, Stefano [4 ]
机构
[1] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[2] IST Austria, Klosterneuburg, Austria
[3] CMU, Pittsburgh, PA USA
[4] NYCU, Hsinchu, Taiwan
关键词
D O I
10.1109/ITW54588.2022.9965870
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M < N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimize the population L-2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools from high-dimensional probability, we show that this non-convex problem can be simplified when the target network is sufficiently over-parameterized, and provide the error rate of this approximation as a function of the input dimension and N. In this mean-field limit, the simplified objective, as well as the optimal weights of the compressed network, does not depend on the realization of the target network, but only on expected scaling factors. Furthermore, for networks with ReLU activation, we conjecture that the optimum of the simplified optimization problem is achieved by taking weights on the Equiangular Tight Frame (ETF), while the scaling of the weights and the orientation of the ETF depend on the parameters of the target network. Numerical evidence is provided to support this conjecture.
引用
下载
收藏
页码:588 / 593
页数:6
相关论文
共 50 条
  • [21] Synchronizability of two-layer networks
    Xu, Mingming
    Zhou, Jin
    Lu, Jun-an
    Wu, Xiaoqun
    EUROPEAN PHYSICAL JOURNAL B, 2015, 88 (09):
  • [22] Two-layer queueing networks
    Kino, I
    JOURNAL OF THE OPERATIONS RESEARCH SOCIETY OF JAPAN, 1997, 40 (02) : 163 - 185
  • [23] Synchronizability of two-layer networks
    Mingming Xu
    Jin Zhou
    Jun-an Lu
    Xiaoqun Wu
    The European Physical Journal B, 2015, 88
  • [24] On the Connection Between Learning Two-Layer Neural Networks and Tensor Decomposition
    Mondelli, Marco
    Montanari, Andrea
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [25] An online gradient method with momentum for two-layer feedforward neural networks
    Zhang, Naimin
    APPLIED MATHEMATICS AND COMPUTATION, 2009, 212 (02) : 488 - 498
  • [26] Convergence of a Gradient Algorithm with Penalty for Training Two-layer Neural Networks
    Shao, Hongmei
    Liu, Lijun
    Zheng, Gaofeng
    2009 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, VOL 4, 2009, : 76 - +
  • [27] Spurious Local Minima are Common in Two-Layer ReLU Neural Networks
    Safran, Itay
    Shamir, Ohad
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [28] Convergence of gradient method with momentum for two-layer feedforward neural networks
    Zhang, NM
    Wu, W
    Zheng, GF
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2006, 17 (02): : 522 - 525
  • [29] The VC dimension and pseudodimension of two-layer neural networks with discrete inputs
    Bartlett, PL
    Williamson, RC
    NEURAL COMPUTATION, 1996, 8 (03) : 625 - 628
  • [30] A New Propagator for Two-Layer Neural Networks in Empirical Model Learning
    Lombardi, Michele
    Gualandi, Stefano
    PRINCIPLES AND PRACTICE OF CONSTRAINT PROGRAMMING, CP 2013, 2013, 8124 : 448 - 463