Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning

被引:0
|
作者
Veeriah, Vivek [1 ]
van Seijen, Harm [1 ,2 ]
Sutton, Richard S. [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada
[2] Univ Alberta, Edmonton, AB, Canada
来源
AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS | 2017年
基金
加拿大自然科学与工程研究理事会;
关键词
Reinforcement Learning; Actor-Critic; Policy Gradient; Nonlinear Function Approximation; Incremental Learning;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-step methods are important in reinforcement learning (RL). Eligibility traces, the usual way of handling them, works well with linear function approximators. Recently, van Seijen (2016) had introduced a delayed learning approach, without eligibility traces, for handling the multi-step lambda-return with nonlinear function approximators. However, this was limited to action-value methods. In this paper, we extend this approach to handle n-step returns, generalize this approach to policy gradient methods and empirically study the effect of such delayed updates in control tasks. Specifically, we introduce two novel forward actor-critic methods and empirically investigate our proposed methods with the conventional actor-critic method on mountain car and pole-balancing tasks. From our experiments, we observe that forward actor-critic dramatically outperforms the conventional actor-critic in these standard control tasks. Notably, this forward actor-critic method has produced a new class of multi-step RL algorithms without eligibility traces.
引用
收藏
页码:556 / 564
页数:9
相关论文
共 50 条
  • [31] THE APPLICATION OF ACTOR-CRITIC REINFORCEMENT LEARNING FOR FAB DISPATCHING SCHEDULING
    Kim, Namyong
    Shin, IIayong
    2017 WINTER SIMULATION CONFERENCE (WSC), 2017, : 4570 - 4571
  • [32] Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning
    Xiao, Yuchen
    Tan, Weihao
    Amato, Christopher
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [33] ACTOR-CRITIC DEEP REINFORCEMENT LEARNING FOR DYNAMIC MULTICHANNEL ACCESS
    Zhong, Chen
    Lu, Ziyang
    Gursoy, M. Cenk
    Velipasalar, Senem
    2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 599 - 603
  • [34] An Actor-Critic Hierarchical Reinforcement Learning Model for Course Recommendation
    Liang, Kun
    Zhang, Guoqiang
    Guo, Jinhui
    Li, Wentao
    ELECTRONICS, 2023, 12 (24)
  • [35] Enhancing cotton irrigation with distributional actor-critic reinforcement learning
    Chen, Yi
    Lin, Meiwei
    Yu, Zhuo
    Sun, Weihong
    Fu, Weiguo
    He, Liang
    AGRICULTURAL WATER MANAGEMENT, 2025, 307
  • [36] Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
    Zanette, Andrea
    Wainwright, Martin J.
    Brunskill, Emma
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [37] Swarm Reinforcement Learning Method Based on an Actor-Critic Method
    Iima, Hitoshi
    Kuroe, Yasuaki
    SIMULATED EVOLUTION AND LEARNING, 2010, 6457 : 279 - 288
  • [38] Manipulator Motion Planning based on Actor-Critic Reinforcement Learning
    Li, Qiang
    Nie, Jun
    Wang, Haixia
    Lu, Xiao
    Song, Shibin
    2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 4248 - 4254
  • [39] Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space
    Fan, Zhou
    Su, Rui
    Zhang, Weinan
    Yu, Yong
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2279 - 2285
  • [40] Actor-critic reinforcement learning for the feedback control of a swinging chain
    Dengler, C.
    Lohmann, B.
    IFAC PAPERSONLINE, 2018, 51 (13): : 378 - 383