Self-scaled conjugate gradient training algorithms

被引：19

作者：

Kostopoulos, A. E. ^{[1
]}

Grapsa, T. N. ^{[1
]}

机构：

[1] Univ Patras, Dept Math, GR-26504 Patras, Greece

来源：

NEUROCOMPUTING | 2009年 / 72卷 / 13-15期

关键词：

Neural network; Training; Self-scaled conjugate gradient; Perry's method; Line search; LEARNING ALGORITHMS; RESTART PROCEDURES; CONVERGENCE;

D O I：

10.1016/j.neucom.2009.04.006

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This article presents some efficient training algorithms, based on conjugate gradient optimization methods. In addition to the existing conjugate gradient training algorithms, we introduce Perry's conjugate gradient method as a training algorithm [A. Perry, A modified conjugate gradient algorithm, Operations Research 26 (1978) 26-43]. Perry's method has been proven to be a very efficient method in the context of unconstrained optimization, but it has never been used in MLP training. Furthermore, a new class of conjugate gradient (CG) methods is proposed, called self-scaled CG methods, which are derived from the principles of Hestenes-Stiefel, Fletcher-Reeves, Polak-Ribiere and Perry's method. This class is based on the spectral scaling parameter introduced in [J. Barzilai, J.M. Borwein, Two point step size gradient methods, IMA journal of Numerical Analysis 8 (1988) 141-148]. The spectral scaling parameter contains second order information without estimating the Hessian matrix. Furthermore, we incorporate to the CG training algorithms an efficient line search technique based on the Wolfe conditions and on safeguarded cubic interpolation [D.F. Shanno, K.H. Phua, Minimization of unconstrained multivariate functions, ACM Transactions on Mathematical Software 2 (1976) 87-94]. In addition, the initial learning rate parameter, fed to the line search technique, was automatically adapted at each iteration by a closed formula proposed in [D.F. Shanno, K.H. Phua, Minimization of unconstrained multivariate functions, ACM Transactions on Mathematical Software 2 (1976) 87-94; D.G. Sotiropoulos, A.E. Kostopoulos, T.N. Grapsa, A spectral version of Perry's conjugate gradient method for neural network training, in: D.T. Tsahalis (Ed.), Fourth GRACM Congress on Computational Mechanics, vol. 1, 2002, pp. 172-179]. Finally, an efficient restarting procedure was employed in order to further improve the effectiveness of the CG training algorithms. Experimental results show that, in general, the new class of methods can perform better with a much lower computational cost and better success performance. (C) 2009 Elsevier B.V. All rights reserved.

引用

页码：3000 / 3019

页数：20

共 50 条

[1] Scaled conjugate gradient algorithms for unconstrained optimization
Andrei, Neculai
[J]. COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2007, 38 (03) : 401 - 416
[2] Scaled conjugate gradient algorithms for unconstrained optimization
Neculai Andrei
[J]. Computational Optimization and Applications, 2007, 38 : 401 - 416
[3] Self-scaled barriers for irreducible symmetric cones
Hauser, RA
Lim, YD
[J]. SIAM JOURNAL ON OPTIMIZATION, 2002, 12 (03) : 715 - 723
[4] Self-Scaled Barrier Functions on Symmetric Cones and Their Classification
Raphael A. Hauser
Osman Güler
[J]. Foundations of Computational Mathematics, 2002, 2 : 121 - 143
[5] Analysis of Weight Initialization Routines for Scaled Conjugate Gradient Training Algorithm
Masood, Sarfaraz
Doja, M. N.
Chandra, Pravin
[J]. 2016 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2016, : 533 - 538
[6] A new class of nonmonotone conjugate gradient training algorithms
Livieris, Ioannis E.
Pintelas, Panagiotis
[J]. APPLIED MATHEMATICS AND COMPUTATION, 2015, 266 : 404 - 413
[7] Self-scaled barrier functions on symmetric cones and their classification
Hauser, RA
Güler, O
[J]. FOUNDATIONS OF COMPUTATIONAL MATHEMATICS, 2002, 2 (02) : 121 - 143
[8] A note on the global convergence theorem of the scaled conjugate gradient algorithms proposed by Andrei
Saman Babaie-Kafaki
[J]. Computational Optimization and Applications, 2012, 52 : 409 - 414
[9] A note on the global convergence theorem of the scaled conjugate gradient algorithms proposed by Andrei
Babaie-Kafaki, Saman
[J]. COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2012, 52 (02) : 409 - 414
[10] Training Bidirectional Recurrent Neural Network Architectures with the Scaled Conjugate Gradient Algorithm
Agathocleous, Michalis
Christodoulou, Chris
Promponas, Vasilis
Kountouris, Petros
Vassiliades, Vassilis
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2016, PT I, 2016, 9886 : 123 - 131

← 1 2 3 4 5 →