ACCELERATING REINFORCEMENT LEARNING WITH A DIRECTIONAL-GAUSSIAN-SMOOTHING EVOLUTION STRATEGY

被引：1

作者：

Zhang, Jiaxin ^{[1
]}

Tran, Hoang ^{[1
]}

Zhang, Guannan ^{[1
]}

机构：

[1] Oak Ridge Natl Lab, Comp Sci & Math Div, Oak Ridge, TN 37831 USA

来源：

ELECTRONIC RESEARCH ARCHIVE | 2021年 / 29卷 / 06期

关键词：

  Reinforcement learning; Gaussian smoothing; non-convex optimization; stochastic control; high-dimensional optimization; Guass-Hermite quadrature;

D O I：

10.3934/era.2021075

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

The objective of reinforcement learning (RL) is to find an optimal strategy for solving a dynamical control problem. Evolution strategy (ES) has been shown great promise in many challenging reinforcement learning (RL) tasks, where the underlying dynamical system is only accessible as a black box such that adjoint methods cannot be used. However, existing ES methods have two limitations that hinder its applicability in RL. First, most existing methods rely on Monte Carlo based gradient estimators to generate search directions. Due to low accuracy of Monte Carlo estimators, the RL training suffers from slow convergence and requires more iterations to reach the optimal solution. Second, the landscape of the reward function can be deceptive and may contain many local maxima, causing ES algorithms to prematurely converge and be unable to explore other parts of the parameter space with potentially greater rewards. In this work, we employ a Directional Gaussian Smoothing Evolutionary Strategy (DGS-ES) to accelerate RL training, which is well-suited to address these two challenges with its ability to (i) provide gradient estimates with high accuracy, and (ii) find nonlocal search direction which lays stress on large-scale variation of the reward function and disregards local fluctuation. Through several benchmark RL tasks demonstrated herein, we show that the DGS-ES method is highly scalable, possesses superior wall-clock time, and achieves competitive reward scores to other popular policy gradient and ES approaches.

引用

页码：4119 / 4135

页数：17

共 50 条

[1] Accelerating deep reinforcement learning model for game strategy
Li, Yifan
Fang, Yuchun
Akhtar, Zahid
[J]. NEUROCOMPUTING, 2020, 408 : 157 - 168
[2] Memetic Evolution Strategy for Reinforcement Learning
Qu, Xinghua
Ong, Yew-Soon
Hou, Yaqing
Shen, Xiaobo
[J]. 2019 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2019, : 1922 - 1928
[3] Intelligent Algorithmic Trading Strategy Using Reinforcement Learning and Directional Change
Aloud, Monira Essa
Alkhamees, Nora
[J]. IEEE ACCESS, 2021, 9 : 114659 - 114671
[4] Adaptive evolution strategy with ensemble of mutations for Reinforcement Learning
Ajani, Oladayo S.
Mallipeddi, Rammohan
[J]. Knowledge-Based Systems, 2022, 245
[5] Adaptive evolution strategy with ensemble of mutations for Reinforcement Learning
Ajani, Oladayo S.
Mallipeddi, Rammohan
[J]. KNOWLEDGE-BASED SYSTEMS, 2022, 245
[6] A residual smoothing strategy for accelerating Newton method continuation
Mavriplis, Dimitri J.
[J]. COMPUTERS & FLUIDS, 2021, 220
[7] Accelerating Reinforcement Learning by Mirror Images
Kitao, Takehiro
Miura, Takao
[J]. INFORMATION MODELLING AND KNOWLEDGE BASES XXVIII, 2017, 292 : 145 - 160
[8] Accelerating Quadratic Optimization with Reinforcement Learning
Ichnowski, Jeffrey
Jain, Paras
Stellato, Bartolomeo
Banjac, Goran
Luo, Michael
Borrelli, Francesco
Gonzalez, Joseph E.
Stoica, Ion
Goldberg, Ken
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[9] Accelerating Reinforcement Learning with Suboptimal Guidance
Bohn, Eivind
Moe, Signe
Johansen, Tor Arne
[J]. IFAC PAPERSONLINE, 2020, 53 (02): : 8090 - 8096
[10] Gaussian processes in reinforcement learning
Rasmussen, CE
Kuss, M
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16, 2004, 16 : 751 - 758

← 1 2 3 4 5 →