Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning

被引：0

作者：

Veeriah, Vivek ^{[1
]}

van Seijen, Harm ^{[1
,2
]}

Sutton, Richard S. ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada

[2] Univ Alberta, Edmonton, AB, Canada

来源：

AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS | 2017年

基金：

加拿大自然科学与工程研究理事会;

关键词：

Reinforcement Learning; Actor-Critic; Policy Gradient; Nonlinear Function Approximation; Incremental Learning;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-step methods are important in reinforcement learning (RL). Eligibility traces, the usual way of handling them, works well with linear function approximators. Recently, van Seijen (2016) had introduced a delayed learning approach, without eligibility traces, for handling the multi-step lambda-return with nonlinear function approximators. However, this was limited to action-value methods. In this paper, we extend this approach to handle n-step returns, generalize this approach to policy gradient methods and empirically study the effect of such delayed updates in control tasks. Specifically, we introduce two novel forward actor-critic methods and empirically investigate our proposed methods with the conventional actor-critic method on mountain car and pole-balancing tasks. From our experiments, we observe that forward actor-critic dramatically outperforms the conventional actor-critic in these standard control tasks. Notably, this forward actor-critic method has produced a new class of multi-step RL algorithms without eligibility traces.

引用

页码：556 / 564

页数：9

共 50 条

[31] THE APPLICATION OF ACTOR-CRITIC REINFORCEMENT LEARNING FOR FAB DISPATCHING SCHEDULING
Kim, Namyong
Shin, IIayong
2017 WINTER SIMULATION CONFERENCE (WSC), 2017, : 4570 - 4571
[32] Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning
Xiao, Yuchen
Tan, Weihao
Amato, Christopher
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[33] ACTOR-CRITIC DEEP REINFORCEMENT LEARNING FOR DYNAMIC MULTICHANNEL ACCESS
Zhong, Chen
Lu, Ziyang
Gursoy, M. Cenk
Velipasalar, Senem
2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 599 - 603
[34] An Actor-Critic Hierarchical Reinforcement Learning Model for Course Recommendation
Liang, Kun
Zhang, Guoqiang
Guo, Jinhui
Li, Wentao
ELECTRONICS, 2023, 12 (24)
[35] Enhancing cotton irrigation with distributional actor-critic reinforcement learning
Chen, Yi
Lin, Meiwei
Yu, Zhuo
Sun, Weihong
Fu, Weiguo
He, Liang
AGRICULTURAL WATER MANAGEMENT, 2025, 307
[36] Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Zanette, Andrea
Wainwright, Martin J.
Brunskill, Emma
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[37] Swarm Reinforcement Learning Method Based on an Actor-Critic Method
Iima, Hitoshi
Kuroe, Yasuaki
SIMULATED EVOLUTION AND LEARNING, 2010, 6457 : 279 - 288
[38] Manipulator Motion Planning based on Actor-Critic Reinforcement Learning
Li, Qiang
Nie, Jun
Wang, Haixia
Lu, Xiao
Song, Shibin
2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 4248 - 4254
[39] Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space
Fan, Zhou
Su, Rui
Zhang, Weinan
Yu, Yong
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2279 - 2285
[40] Actor-critic reinforcement learning for the feedback control of a swinging chain
Dengler, C.
Lohmann, B.
IFAC PAPERSONLINE, 2018, 51 (13): : 378 - 383

← 1 2 3 4 5 →