Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning

被引：0

作者：

Veeriah, Vivek ^{[1
]}

van Seijen, Harm ^{[1
,2
]}

Sutton, Richard S. ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada

[2] Univ Alberta, Edmonton, AB, Canada

来源：

AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS | 2017年

基金：

加拿大自然科学与工程研究理事会;

关键词：

Reinforcement Learning; Actor-Critic; Policy Gradient; Nonlinear Function Approximation; Incremental Learning;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-step methods are important in reinforcement learning (RL). Eligibility traces, the usual way of handling them, works well with linear function approximators. Recently, van Seijen (2016) had introduced a delayed learning approach, without eligibility traces, for handling the multi-step lambda-return with nonlinear function approximators. However, this was limited to action-value methods. In this paper, we extend this approach to handle n-step returns, generalize this approach to policy gradient methods and empirically study the effect of such delayed updates in control tasks. Specifically, we introduce two novel forward actor-critic methods and empirically investigate our proposed methods with the conventional actor-critic method on mountain car and pole-balancing tasks. From our experiments, we observe that forward actor-critic dramatically outperforms the conventional actor-critic in these standard control tasks. Notably, this forward actor-critic method has produced a new class of multi-step RL algorithms without eligibility traces.

引用

页码：556 / 564

页数：9

共 50 条

[41] A Prioritized objective actor-critic method for deep reinforcement learning
Ngoc Duy Nguyen
Thanh Thi Nguyen
Peter Vamplew
Richard Dazeley
Saeid Nahavandi
Neural Computing and Applications, 2021, 33 : 10335 - 10349
[42] A Prioritized objective actor-critic method for deep reinforcement learning
Nguyen, Ngoc Duy
Nguyen, Thanh Thi
Vamplew, Peter
Dazeley, Richard
Nahavandi, Saeid
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (16): : 10335 - 10349
[43] Evaluating Correctness of Reinforcement Learning based on Actor-Critic Algorithm
Kim, Youngjae
Hussain, Manzoor
Suh, Jae-Won
Hong, Jang-Eui
2022 THIRTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN), 2022, : 320 - 325
[44] Asymmetric Actor-Critic for Adapting to Changing Environments in Reinforcement Learning
Yue, Wangyang
Zhou, Yuan
Zhang, Xiaochuan
Hua, Yuchen
Li, Minne
Fan, Zunlin
Wang, Zhiyuan
Kou, Guang
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT IV, 2024, 15019 : 325 - 339
[45] Dual Variable Actor-Critic for Adaptive Safe Reinforcement Learning
Lee, Junseo
Heo, Jaeseok
Kim, Dohyeong
Lee, Gunmin
Oh, Songhwai
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 7568 - 7573
[46] Dynamic Charging Scheme Problem With Actor-Critic Reinforcement Learning
Yang, Meiyi
Liu, Nianbo
Zuo, Lin
Feng, Yong
Liu, Minghui
Gong, Haigang
Liu, Ming
IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (01) : 370 - 380
[47] Pseudorehearsal in actor-critic agents with neural network function approximation
Marochko, Vladimir
Johard, Leonard
Mazzara, Manuel
Longo, Luca
PROCEEDINGS 2018 IEEE 32ND INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2018, : 644 - 650
[48] Optimal synchronized control of nonlinear coupled harmonic oscillators based on actor-critic reinforcement learning
Gu, Zhiyang
Fan, Chengli
Yu, Dengxiu
Wang, Zhen
NONLINEAR DYNAMICS, 2023, 111 (22) : 21051 - 21064
[49] An extension of Genetic Network Programming with Reinforcement Learning using actor-critic
Hatakeyama, Hiroyuki
Mabu, Shingo
Hirasawa, Kotaro
Hu, Jinglu
2006 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-6, 2006, : 1522 - +
[50] Bringing Fairness to Actor-Critic Reinforcement Learning for Network Utility Optimization
Chen, Jingdi
Wang, Yimeng
Lan, Tian
IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021), 2021,

← 1 2 3 4 5 →