Quasi-Newton Trust Region Policy Optimization

被引:0
|
作者
Jha, Devesh K. [1 ]
Raghunathan, Arvind U. [1 ]
Romeres, Diego [1 ]
机构
[1] MERL, Cambridge, MA 02139 USA
来源
关键词
Reinforcement Learning; Trust Region Optimization; Quasi-Newton Methods; Policy Gradient; GAME; GO;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We propose a trust region method for policy optimization that employs Quasi-Newton approximation for the Hessian, called Quasi-Newton Trust Region Policy Optimization (QNTRPO). Gradient descent is the de facto algorithm for reinforcement learning tasks with continuous controls. The algorithm has achieved state-of-the-art performance when used in reinforcement learning across a wide range of tasks. However, the algorithm suffers from a number of drawbacks including: lack of stepsize selection criterion, and slow convergence. We investigate the use of a trust region method using dogleg step and a Quasi-Newton approximation for the Hessian for policy optimization. We demonstrate through numerical experiments over a wide range of challenging continuous control tasks that our particular choice is efficient in terms of number of samples and improves performance.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] PROJECTED QUASI-NEWTON ALGORITHM WITH TRUST REGION FOR CONSTRAINED OPTIMIZATION
    ZHANG, JZ
    ZHU, DT
    [J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1990, 67 (02) : 369 - 393
  • [2] A subspace implementation of quasi-Newton trust region methods for unconstrained optimization
    Zhou-Hong Wang
    Ya-Xiang Yuan
    [J]. Numerische Mathematik, 2006, 104 : 241 - 269
  • [3] A subspace implementation of quasi-Newton trust region methods for unconstrained optimization
    Wang, Zhou-Hong
    Yuan, Ya-Xiang
    [J]. NUMERISCHE MATHEMATIK, 2006, 104 (02) : 241 - 269
  • [4] Reduced storage, quasi-Newton trust region approaches to function optimization
    Kaufman, L
    [J]. SIAM JOURNAL ON OPTIMIZATION, 1999, 10 (01) : 56 - 69
  • [5] A quasi-Newton trust-region method
    Gertz, EM
    [J]. MATHEMATICAL PROGRAMMING, 2004, 100 (03) : 447 - 470
  • [6] A quasi-Newton trust-region method
    E. Michael Gertz
    [J]. Mathematical Programming, 2004, 100 : 447 - 470
  • [7] A quasi-Newton trust region method with a new conic model for the unconstrained optimization
    Lu, Xiaoping
    Ni, Qin
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2008, 204 (01) : 373 - 384
  • [8] A PROXIMAL QUASI-NEWTON TRUST-REGION METHOD FOR NONSMOOTH REGULARIZED OPTIMIZATION
    Aravkin, Aleksandr Y.
    Baraldi, Robert
    Orban, Dominique
    [J]. SIAM JOURNAL ON OPTIMIZATION, 2022, 32 (02) : 900 - 929
  • [9] AN LDLT TRUST-REGION QUASI-NEWTON METHOD
    Brust, Johannes J.
    Gill, Philip E.
    [J]. SIAM Journal on Scientific Computing, 2024, 46 (05):
  • [10] A nonmonotone quasi-Newton trust-region method of conic model for unconstrained optimization
    Qu, Shao-Jian
    Zhang, Qing-Pu
    Jiang, Su-Da
    [J]. OPTIMIZATION METHODS & SOFTWARE, 2009, 24 (03): : 339 - 367