ACCELERATING REINFORCEMENT LEARNING WITH A DIRECTIONAL-GAUSSIAN-SMOOTHING EVOLUTION STRATEGY

被引:1
|
作者
Zhang, Jiaxin [1 ]
Tran, Hoang [1 ]
Zhang, Guannan [1 ]
机构
[1] Oak Ridge Natl Lab, Comp Sci & Math Div, Oak Ridge, TN 37831 USA
来源
ELECTRONIC RESEARCH ARCHIVE | 2021年 / 29卷 / 06期
关键词
  Reinforcement learning; Gaussian smoothing; non-convex optimization; stochastic control; high-dimensional optimization; Guass-Hermite quadrature;
D O I
10.3934/era.2021075
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The objective of reinforcement learning (RL) is to find an optimal strategy for solving a dynamical control problem. Evolution strategy (ES) has been shown great promise in many challenging reinforcement learning (RL) tasks, where the underlying dynamical system is only accessible as a black box such that adjoint methods cannot be used. However, existing ES methods have two limitations that hinder its applicability in RL. First, most existing methods rely on Monte Carlo based gradient estimators to generate search directions. Due to low accuracy of Monte Carlo estimators, the RL training suffers from slow convergence and requires more iterations to reach the optimal solution. Second, the landscape of the reward function can be deceptive and may contain many local maxima, causing ES algorithms to prematurely converge and be unable to explore other parts of the parameter space with potentially greater rewards. In this work, we employ a Directional Gaussian Smoothing Evolutionary Strategy (DGS-ES) to accelerate RL training, which is well-suited to address these two challenges with its ability to (i) provide gradient estimates with high accuracy, and (ii) find nonlocal search direction which lays stress on large-scale variation of the reward function and disregards local fluctuation. Through several benchmark RL tasks demonstrated herein, we show that the DGS-ES method is highly scalable, possesses superior wall-clock time, and achieves competitive reward scores to other popular policy gradient and ES approaches.
引用
收藏
页码:4119 / 4135
页数:17
相关论文
共 50 条
  • [1] Accelerating deep reinforcement learning model for game strategy
    Li, Yifan
    Fang, Yuchun
    Akhtar, Zahid
    [J]. NEUROCOMPUTING, 2020, 408 : 157 - 168
  • [2] Memetic Evolution Strategy for Reinforcement Learning
    Qu, Xinghua
    Ong, Yew-Soon
    Hou, Yaqing
    Shen, Xiaobo
    [J]. 2019 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2019, : 1922 - 1928
  • [3] Intelligent Algorithmic Trading Strategy Using Reinforcement Learning and Directional Change
    Aloud, Monira Essa
    Alkhamees, Nora
    [J]. IEEE ACCESS, 2021, 9 : 114659 - 114671
  • [4] Adaptive evolution strategy with ensemble of mutations for Reinforcement Learning
    Ajani, Oladayo S.
    Mallipeddi, Rammohan
    [J]. Knowledge-Based Systems, 2022, 245
  • [5] Adaptive evolution strategy with ensemble of mutations for Reinforcement Learning
    Ajani, Oladayo S.
    Mallipeddi, Rammohan
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 245
  • [6] A residual smoothing strategy for accelerating Newton method continuation
    Mavriplis, Dimitri J.
    [J]. COMPUTERS & FLUIDS, 2021, 220
  • [7] Accelerating Reinforcement Learning by Mirror Images
    Kitao, Takehiro
    Miura, Takao
    [J]. INFORMATION MODELLING AND KNOWLEDGE BASES XXVIII, 2017, 292 : 145 - 160
  • [8] Accelerating Quadratic Optimization with Reinforcement Learning
    Ichnowski, Jeffrey
    Jain, Paras
    Stellato, Bartolomeo
    Banjac, Goran
    Luo, Michael
    Borrelli, Francesco
    Gonzalez, Joseph E.
    Stoica, Ion
    Goldberg, Ken
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [9] Accelerating Reinforcement Learning with Suboptimal Guidance
    Bohn, Eivind
    Moe, Signe
    Johansen, Tor Arne
    [J]. IFAC PAPERSONLINE, 2020, 53 (02): : 8090 - 8096
  • [10] Gaussian processes in reinforcement learning
    Rasmussen, CE
    Kuss, M
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16, 2004, 16 : 751 - 758