A Limitation of Gradient Descent Learning

被引:11
|
作者
Sum, John [1 ]
Leung, Chi-Sing [2 ]
Ho, Kevin [3 ]
机构
[1] Natl Chung Hsing Univ, Inst Technol Management, Taichung 40227, Taiwan
[2] City Univ Hong Kong, Dept Elect Engn, Hong Kong, Peoples R China
[3] Providence Univ, Dept Comp Sci & Commun Engn, Taichung 43301, Taiwan
关键词
Additive weight noise; gradient descent algorithms; MNIST; multiplicative weight noise; SYNAPTIC WEIGHT NOISE; FAULT-TOLERANCE; NEURAL-NETWORKS; BACKPROPAGATION; INJECTION; CONVERGENCE; INPUTS;
D O I
10.1109/TNNLS.2019.2927689
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over decades, gradient descent has been applied to develop learning algorithm to train a neural network (NN). In this brief, a limitation of applying such algorithm to train an NN with persistent weight noise is revealed. Let V(w) be the performance measure of an ideal NN. V(w) is applied to develop the gradient descent learning (GDL). With weight noise, the desired performance measure (denoted as J(w)) is E[ V(w)|w], where w is the noisy weight vector. Applying GDL to train an NN with weight noise, the actual learning objective is clearly not V(w) but another scalar function L(w). For decades, there is a misconception that L(w) = J(w), and hence, the actual model attained by the GDL is the desired model. However, we show that it might not: 1) with persistent additive weight noise, the actual model attained is the desired model as L(w) = J(w); and 2) with persistent multiplicative weight noise, the actual model attained is unlikely the desired model as L(w) = J(w). Accordingly, the properties of the models attained as compared with the desired models are analyzed and the learning curves are sketched. Simulation results on 1) a simple regression problem and 2) the MNIST handwritten digit recognition are presented to support our claims.
引用
收藏
页码:2227 / 2232
页数:6
相关论文
共 50 条
  • [1] Learning to learn by gradient descent by gradient descent
    Andrychowicz, Marcin
    Denil, Misha
    Colmenarejo, Sergio Gomez
    Hoffman, Matthew W.
    Pfau, David
    Schaul, Tom
    Shillingford, Brendan
    de Freitas, Nando
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [2] Learning to Learn without Gradient Descent by Gradient Descent
    Chen, Yutian
    Hoffman, Matthew W.
    Colmenarejo, Sergio Gomez
    Denil, Misha
    Lillicrap, Timothy P.
    Botvinick, Matt
    de Freitas, Nando
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [3] Learning Fractals by Gradient Descent
    Tu, Cheng-Hao
    Chen, Hong-You
    Carlyn, David
    Chao, Wei-Lun
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 2456 - 2464
  • [4] Gradient Descent Learning With Floats
    Sun, Tao
    Tang, Ke
    Li, Dongsheng
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (03) : 1763 - 1771
  • [5] LEARNING BY ONLINE GRADIENT DESCENT
    BIEHL, M
    SCHWARZE, H
    [J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1995, 28 (03): : 643 - 656
  • [6] Learning to Learn Gradient Aggregation by Gradient Descent
    Ji, Jinlong
    Chen, Xuhui
    Wang, Qianlong
    Yu, Lixing
    Li, Pan
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2614 - 2620
  • [7] Gradient learning in a classification setting by gradient descent
    Cai, Jia
    Wang, Hongyan
    Zhou, Ding-Xuan
    [J]. JOURNAL OF APPROXIMATION THEORY, 2009, 161 (02) : 674 - 692
  • [8] MaskConnect: Connectivity Learning by Gradient Descent
    Ahmed, Karim
    Torresani, Lorenzo
    [J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 362 - 378
  • [9] Learning gradients by a gradient descent algorithm
    Dong, Xuemei
    Zhou, Ding-Xuan
    [J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2008, 341 (02) : 1018 - 1027
  • [10] On Early Stopping in Gradient Descent Learning
    Yuan Yao
    Lorenzo Rosasco
    Andrea Caponnetto
    [J]. Constructive Approximation, 2007, 26 : 289 - 315