Quasi-Newton Iteration in Deterministic Policy Gradient

被引:0
|
作者
Kordabad, Arash Bahari [1 ]
Esfahani, Hossein Nejatbakhsh [1 ]
Cai, Wenqi [1 ]
Gros, Sebastien [1 ]
机构
[1] Norwegian Univ Sci & Technol NTNU, Dept Engn Cybernet, Trondheim, Norway
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a model-free approximation for the Hessian of the performance of deterministic policies to use in the context of Reinforcement Learning based on Quasi-Newton steps in the policy parameters. We show that the approximate Hessian converges to the exact Hessian at the optimal policy, and allows for a superlinear convergence in the learning, provided that the policy parametrization is rich. The natural policy gradient method can be interpreted as a particular case of the proposed method. We analytically verify the formulation in a simple linear case and compare the convergence of the proposed method with the natural policy gradient in a nonlinear example.
引用
收藏
页码:2124 / 2129
页数:6
相关论文
共 50 条
  • [1] On an accelerating quasi-Newton circular iteration
    Sun, FY
    Li, XF
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 1999, 106 (01) : 17 - 29
  • [2] Quasi-Newton Based Preconditioning and Damped Quasi-Newton Schemes for Nonlinear Conjugate Gradient Methods
    Al-Baali, Mehiddin
    Caliciotti, Andrea
    Fasano, Giovanni
    Roma, Massimo
    [J]. NUMERICAL ANALYSIS AND OPTIMIZATION, 2018, 235 : 1 - 21
  • [3] On quasi-Newton methods with modified quasi-Newton equation
    Xiao, Wei
    Sun, Fengjian
    [J]. PROCEEDINGS OF 2008 INTERNATIONAL PRE-OLYMPIC CONGRESS ON COMPUTER SCIENCE, VOL II: INFORMATION SCIENCE AND ENGINEERING, 2008, : 359 - 363
  • [4] DIAGONAL SHADOW - A QUASI-NEWTON ITERATION IN SPECTRAL-DOMAIN
    RYSKIN, G
    [J]. JOURNAL OF COMPUTATIONAL PHYSICS, 1994, 111 (02) : 410 - 413
  • [5] A QUASI-NEWTON METHOD WITH MODIFICATION OF ONE COLUMN PER ITERATION
    MARTINEZ, JM
    [J]. COMPUTING, 1984, 33 (3-4) : 353 - 362
  • [6] Quasi-Newton Trust Region Policy Optimization
    Jha, Devesh K.
    Raghunathan, Arvind U.
    Romeres, Diego
    [J]. CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
  • [7] A Survey of Quasi-Newton Equations and Quasi-Newton Methods for Optimization
    Chengxian Xu
    Jianzhong Zhang
    [J]. Annals of Operations Research, 2001, 103 : 213 - 234
  • [8] Preconditioning Nonlinear Conjugate Gradient with Diagonalized Quasi-Newton
    Dener, Alp
    Denchfield, Adam
    Munson, Todd
    [J]. PROCEEDINGS OF THE PLATFORM FOR ADVANCED SCIENTIFIC COMPUTING CONFERENCE (PASC '19), 2019,
  • [9] Survey of quasi-Newton equations and quasi-Newton methods for optimization
    Xu, CX
    Zhang, JZ
    [J]. ANNALS OF OPERATIONS RESEARCH, 2001, 103 (1-4) : 213 - 234
  • [10] A Stochastic Quasi-Newton Method with Nesterov's Accelerated Gradient
    Indrapriyadarsini, S.
    Mahboubi, Shahrzad
    Ninomiya, Hiroshi
    Asai, Hideki
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT I, 2020, 11906 : 743 - 760