Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning

被引：0

作者：

Veeriah, Vivek ^{[1
]}

van Seijen, Harm ^{[1
,2
]}

Sutton, Richard S. ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada

[2] Univ Alberta, Edmonton, AB, Canada

来源：

AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS | 2017年

基金：

加拿大自然科学与工程研究理事会;

关键词：

Reinforcement Learning; Actor-Critic; Policy Gradient; Nonlinear Function Approximation; Incremental Learning;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-step methods are important in reinforcement learning (RL). Eligibility traces, the usual way of handling them, works well with linear function approximators. Recently, van Seijen (2016) had introduced a delayed learning approach, without eligibility traces, for handling the multi-step lambda-return with nonlinear function approximators. However, this was limited to action-value methods. In this paper, we extend this approach to handle n-step returns, generalize this approach to policy gradient methods and empirically study the effect of such delayed updates in control tasks. Specifically, we introduce two novel forward actor-critic methods and empirically investigate our proposed methods with the conventional actor-critic method on mountain car and pole-balancing tasks. From our experiments, we observe that forward actor-critic dramatically outperforms the conventional actor-critic in these standard control tasks. Notably, this forward actor-critic method has produced a new class of multi-step RL algorithms without eligibility traces.

引用

页码：556 / 564

页数：9

共 50 条

[21] Reinforcement learning with actor-critic for knowledge graph reasoning
Linli ZHANG
Dewei LI
Yugeng XI
Shuai JIA
ScienceChina(InformationSciences), 2020, 63 (06) : 223 - 225
[22] Addressing Function Approximation Error in Actor-Critic Methods
Fujimoto, Scott
van Hoof, Herke
Meger, David
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[23] Actor-Critic Reinforcement Learning for Control With Stability Guarantee
Han, Minghao
Zhang, Lixian
Wang, Jun
Pan, Wei
IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04) : 6217 - 6224
[24] Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
Wu, Yue
Zhai, Shuangfei
Srivastava, Nitish
Susskind, Joshua
Zhang, Jian
Salakhutdinov, Ruslan
Goh, Hanlin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[25] Deep Actor-Critic Reinforcement Learning for Anomaly Detection
Zhong, Chen
Gursoy, M. Cenk
Velipasalar, Senem
2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
[26] MARS: Malleable Actor-Critic Reinforcement Learning Scheduler
Baheri, Betis
Tronge, Jacob
Fang, Bo
Li, Ang
Chaudhary, Vipin
Guan, Qiang
2022 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE, IPCCC, 2022,
[27] Averaged Soft Actor-Critic for Deep Reinforcement Learning
Ding, Feng
Ma, Guanfeng
Chen, Zhikui
Gao, Jing
Li, Peng
COMPLEXITY, 2021, 2021
[28] Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation
Dong, Jing
Shen, Li
Xu, Yinggan
Wang, Baoxiang
arXiv, 2022,
[29] Optimized Adaptive Nonlinear Tracking Control Using Actor-Critic Reinforcement Learning Strategy
Wen, Guoxing
Chen, C. L. Philip
Ge, Shuzhi Sam
Yang, Hongli
Liu, Xiaoguang
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2019, 15 (09) : 4969 - 4977
[30] Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation
Dong, Jing
Shen, Li
Xu, Yinggan
Wang, Baoxiang
Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2023, 2023-May : 2640 - 2642

← 1 2 3 4 5 →