Theoretical analysis of batch and on-line training for gradient descent learning in neural networks

被引:46
|
作者
Nakama, Takehiko [1 ]
机构
[1] Johns Hopkins Univ, Dept Appl Math & Stat, Baltimore, MD 21218 USA
关键词
Neural networks; Gradient descent learning; Batch training; On-line training; Quadratic loss functions; Convergence;
D O I
10.1016/j.neucom.2009.05.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we theoretically analyze two essential training schemes for gradient descent learning in neural networks: batch and on-line training. The convergence properties of the two schemes applied to quadratic loss functions are analytically investigated. We quantify the convergence of each training scheme to the optimal weight using the absolute value of the expected difference (Measure I) and the expected squared difference (Measure 2) between the optimal weight and the weight computed by the scheme. Although on-line training has several advantages over batch training with respect to the first measure, it does not converge to the optimal weight with respect to the second measure if the variance of the per-instance gradient remains constant. However, if the variance decays exponentially, then on-line training converges to the optimal weight with respect to Measure 2. Our analysis reveals the exact degrees to which the training set size, the variance of the per-instance gradient, and the learning rate affect the rate of convergence for each scheme. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:151 / 159
页数:9
相关论文
共 50 条
  • [1] Dynamics of on-line gradient descent learning for multilayer neural networks
    Saad, D
    Solla, SA
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 302 - 308
  • [2] Natural gradient descent for on-line learning
    Rattray, M
    Saad, D
    Amari, S
    [J]. PHYSICAL REVIEW LETTERS, 1998, 81 (24) : 5461 - 5464
  • [3] Natural gradient descent for on-line learning
    [J]. Phys Rev Lett, 24 (5461):
  • [4] Training Morphological Neural Networks with Gradient Descent: Some Theoretical Insights
    Blusseau, Samy
    [J]. DISCRETE GEOMETRY AND MATHEMATICAL MORPHOLOGY, DGMM 2024, 2024, 14605 : 229 - 241
  • [5] Gradient Descent Analysis: On Visualizing the Training of Deep Neural Networks
    Becker, Martin
    Lippel, Jens
    Zielke, Thomas
    [J]. PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL 3: IVAPP, 2019, : 338 - 345
  • [6] The general inefficiency of batch training for gradient descent learning
    Wilson, DR
    Martinez, TR
    [J]. NEURAL NETWORKS, 2003, 16 (10) : 1429 - 1451
  • [7] ANALYSIS OF GRADIENT DESCENT LEARNING ALGORITHMS FOR MULTILAYER FEEDFORWARD NEURAL NETWORKS
    GUO, H
    GELFAND, SB
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, 1991, 38 (08): : 883 - 894
  • [8] Gradient descent learning for quaternionic Hopfield neural networks
    Kobayashi, Masaki
    [J]. NEUROCOMPUTING, 2017, 260 : 174 - 179
  • [9] Learning Graph Neural Networks with Approximate Gradient Descent
    Li, Qunwei
    Zou, Shaofeng
    Zhong, Wenliang
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8438 - 8446
  • [10] A gradient descent learning algorithm for fuzzy neural networks
    Feuring, T
    Buckley, JJ
    Hayashi, Y
    [J]. 1998 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AT THE IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE - PROCEEDINGS, VOL 1-2, 1998, : 1136 - 1141