Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning

被引:0
|
作者
Veeriah, Vivek [1 ]
van Seijen, Harm [1 ,2 ]
Sutton, Richard S. [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada
[2] Univ Alberta, Edmonton, AB, Canada
来源
AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS | 2017年
基金
加拿大自然科学与工程研究理事会;
关键词
Reinforcement Learning; Actor-Critic; Policy Gradient; Nonlinear Function Approximation; Incremental Learning;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-step methods are important in reinforcement learning (RL). Eligibility traces, the usual way of handling them, works well with linear function approximators. Recently, van Seijen (2016) had introduced a delayed learning approach, without eligibility traces, for handling the multi-step lambda-return with nonlinear function approximators. However, this was limited to action-value methods. In this paper, we extend this approach to handle n-step returns, generalize this approach to policy gradient methods and empirically study the effect of such delayed updates in control tasks. Specifically, we introduce two novel forward actor-critic methods and empirically investigate our proposed methods with the conventional actor-critic method on mountain car and pole-balancing tasks. From our experiments, we observe that forward actor-critic dramatically outperforms the conventional actor-critic in these standard control tasks. Notably, this forward actor-critic method has produced a new class of multi-step RL algorithms without eligibility traces.
引用
收藏
页码:556 / 564
页数:9
相关论文
共 50 条
  • [41] A Prioritized objective actor-critic method for deep reinforcement learning
    Ngoc Duy Nguyen
    Thanh Thi Nguyen
    Peter Vamplew
    Richard Dazeley
    Saeid Nahavandi
    Neural Computing and Applications, 2021, 33 : 10335 - 10349
  • [42] A Prioritized objective actor-critic method for deep reinforcement learning
    Nguyen, Ngoc Duy
    Nguyen, Thanh Thi
    Vamplew, Peter
    Dazeley, Richard
    Nahavandi, Saeid
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (16): : 10335 - 10349
  • [43] Evaluating Correctness of Reinforcement Learning based on Actor-Critic Algorithm
    Kim, Youngjae
    Hussain, Manzoor
    Suh, Jae-Won
    Hong, Jang-Eui
    2022 THIRTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN), 2022, : 320 - 325
  • [44] Asymmetric Actor-Critic for Adapting to Changing Environments in Reinforcement Learning
    Yue, Wangyang
    Zhou, Yuan
    Zhang, Xiaochuan
    Hua, Yuchen
    Li, Minne
    Fan, Zunlin
    Wang, Zhiyuan
    Kou, Guang
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT IV, 2024, 15019 : 325 - 339
  • [45] Dual Variable Actor-Critic for Adaptive Safe Reinforcement Learning
    Lee, Junseo
    Heo, Jaeseok
    Kim, Dohyeong
    Lee, Gunmin
    Oh, Songhwai
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 7568 - 7573
  • [46] Dynamic Charging Scheme Problem With Actor-Critic Reinforcement Learning
    Yang, Meiyi
    Liu, Nianbo
    Zuo, Lin
    Feng, Yong
    Liu, Minghui
    Gong, Haigang
    Liu, Ming
    IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (01) : 370 - 380
  • [47] Pseudorehearsal in actor-critic agents with neural network function approximation
    Marochko, Vladimir
    Johard, Leonard
    Mazzara, Manuel
    Longo, Luca
    PROCEEDINGS 2018 IEEE 32ND INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2018, : 644 - 650
  • [48] Optimal synchronized control of nonlinear coupled harmonic oscillators based on actor-critic reinforcement learning
    Gu, Zhiyang
    Fan, Chengli
    Yu, Dengxiu
    Wang, Zhen
    NONLINEAR DYNAMICS, 2023, 111 (22) : 21051 - 21064
  • [49] An extension of Genetic Network Programming with Reinforcement Learning using actor-critic
    Hatakeyama, Hiroyuki
    Mabu, Shingo
    Hirasawa, Kotaro
    Hu, Jinglu
    2006 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-6, 2006, : 1522 - +
  • [50] Bringing Fairness to Actor-Critic Reinforcement Learning for Network Utility Optimization
    Chen, Jingdi
    Wang, Yimeng
    Lan, Tian
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021), 2021,