Theoretical analysis of batch and on-line training for gradient descent learning in neural networks

被引:46
|
作者
Nakama, Takehiko [1 ]
机构
[1] Johns Hopkins Univ, Dept Appl Math & Stat, Baltimore, MD 21218 USA
关键词
Neural networks; Gradient descent learning; Batch training; On-line training; Quadratic loss functions; Convergence;
D O I
10.1016/j.neucom.2009.05.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we theoretically analyze two essential training schemes for gradient descent learning in neural networks: batch and on-line training. The convergence properties of the two schemes applied to quadratic loss functions are analytically investigated. We quantify the convergence of each training scheme to the optimal weight using the absolute value of the expected difference (Measure I) and the expected squared difference (Measure 2) between the optimal weight and the weight computed by the scheme. Although on-line training has several advantages over batch training with respect to the first measure, it does not converge to the optimal weight with respect to the second measure if the variance of the per-instance gradient remains constant. However, if the variance decays exponentially, then on-line training converges to the optimal weight with respect to Measure 2. Our analysis reveals the exact degrees to which the training set size, the variance of the per-instance gradient, and the learning rate affect the rate of convergence for each scheme. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:151 / 159
页数:9
相关论文
共 50 条
  • [31] Adaptive stepsize algorithms for on-line training of neural networks
    Magoulas, GD
    Plagianakos, VP
    Vrahatis, MN
    [J]. NONLINEAR ANALYSIS-THEORY METHODS & APPLICATIONS, 2001, 47 (05) : 3425 - 3430
  • [32] Gradient Descent for Spiking Neural Networks
    Huh, Dongsung
    Sejnowski, Terrence J.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [33] INVERSION OF NEURAL NETWORKS BY GRADIENT DESCENT
    KINDERMANN, J
    LINDEN, A
    [J]. PARALLEL COMPUTING, 1990, 14 (03) : 277 - 286
  • [34] An adaptive gradient-descent-based neural networks for the on-line solution of linear time variant equations and its applications
    Cai, Jun
    Yi, Chenfu
    [J]. INFORMATION SCIENCES, 2023, 622 : 34 - 45
  • [35] Meta-learning spiking neural networks with surrogate gradient descent
    Stewart, Kenneth M.
    Neftci, Emre O.
    [J]. NEUROMORPHIC COMPUTING AND ENGINEERING, 2022, 2 (04):
  • [36] Is Learning in Biological Neural Networks Based on Stochastic Gradient Descent? An Analysis Using Stochastic Processes
    Christensen, Soeren
    Kallsen, Jan
    [J]. NEURAL COMPUTATION, 2024, 36 (07) : 1424 - 1432
  • [37] Stochastic Markov gradient descent and training low-bit neural networks
    Ashbrock, Jonathan
    Powell, Alexander M.
    [J]. SAMPLING THEORY SIGNAL PROCESSING AND DATA ANALYSIS, 2021, 19 (02):
  • [38] Natural Gradient Descent for Training Stochastic Complex-Valued Neural Networks
    Nitta, Tohru
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 193 - 198
  • [39] Non-convergence of stochastic gradient descent in the training of deep neural networks
    Cheridito, Patrick
    Jentzen, Arnulf
    Rossmannek, Florian
    [J]. JOURNAL OF COMPLEXITY, 2021, 64
  • [40] A proof of convergence for gradient descent in the training of artificial neural networks for constant functions
    Cheridito, Patrick
    Jentzen, Arnulf
    Riekert, Adrian
    Rossmannek, Florian
    [J]. JOURNAL OF COMPLEXITY, 2022, 72