On the sample complexity of actor-critic method for reinforcement learning with function approximation

被引:0
|
作者
Harshat Kumar
Alec Koppel
Alejandro Ribeiro
机构
[1] The University of Pennsylvania,Department of Electrical and Systems Engineering
[2] JPMorgan AI Research,undefined
来源
Machine Learning | 2023年 / 112卷
关键词
Actor-critic; Reinforcement learning; Markov decision process; Non-convex optimization; Stochastic programming;
D O I
暂无
中图分类号
学科分类号
摘要
Reinforcement learning, mathematically described by Markov Decision Problems, may be approached either through dynamic programming or policy search. Actor-critic algorithms combine the merits of both approaches by alternating between steps to estimate the value function and policy gradient updates. Due to the fact that the updates exhibit correlated noise and biased gradient updates, only the asymptotic behavior of actor-critic is known by connecting its behavior to dynamical systems. This work puts forth a new variant of actor-critic that employs Monte Carlo rollouts during the policy search updates, which results in controllable bias that depends on the number of critic evaluations. As a result, we are able to provide for the first time the convergence rate of actor-critic algorithms when the policy search step employs policy gradient, agnostic to the choice of policy evaluation technique. In particular, we establish conditions under which the sample complexity is comparable to stochastic gradient method for non-convex problems or slower as a result of the critic estimation error, which is the main complexity bottleneck. These results hold in continuous state and action spaces with linear function approximation for the value function. We then specialize these conceptual results to the case where the critic is estimated by Temporal Difference, Gradient Temporal Difference, and Accelerated Gradient Temporal Difference. These learning rates are then corroborated on a navigation problem involving an obstacle and the pendulum problem which provide insight into the interplay between optimization and generalization in reinforcement learning.
引用
收藏
页码:2433 / 2467
页数:34
相关论文
共 50 条
  • [1] On the sample complexity of actor-critic method for reinforcement learning with function approximation
    Kumar, Harshat
    Koppel, Alec
    Ribeiro, Alejandro
    [J]. MACHINE LEARNING, 2023, 112 (07) : 2433 - 2467
  • [2] Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning
    Veeriah, Vivek
    van Seijen, Harm
    Sutton, Richard S.
    [J]. AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 556 - 564
  • [3] Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation
    Zhou, Ruida
    Liu, Tao
    Cheng, Min
    Kalathil, Dileep
    Kumar, P. R.
    Tian, Chao
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Swarm Reinforcement Learning Method Based on an Actor-Critic Method
    Iima, Hitoshi
    Kuroe, Yasuaki
    [J]. SIMULATED EVOLUTION AND LEARNING, 2010, 6457 : 279 - 288
  • [5] A Prioritized objective actor-critic method for deep reinforcement learning
    Ngoc Duy Nguyen
    Thanh Thi Nguyen
    Peter Vamplew
    Richard Dazeley
    Saeid Nahavandi
    [J]. Neural Computing and Applications, 2021, 33 : 10335 - 10349
  • [6] A Prioritized objective actor-critic method for deep reinforcement learning
    Nguyen, Ngoc Duy
    Nguyen, Thanh Thi
    Vamplew, Peter
    Dazeley, Richard
    Nahavandi, Saeid
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (16): : 10335 - 10349
  • [7] A World Model for Actor-Critic in Reinforcement Learning
    Panov, A. I.
    Ugadiarov, L. A.
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477
  • [8] Curious Hierarchical Actor-Critic Reinforcement Learning
    Roeder, Frank
    Eppe, Manfred
    Nguyen, Phuong D. H.
    Wermter, Stefan
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 408 - 419
  • [9] Actor-Critic based Improper Reinforcement Learning
    Zaki, Mohammadi
    Mohan, Avinash
    Gopalan, Aditya
    Mannor, Shie
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [10] Integrated Actor-Critic for Deep Reinforcement Learning
    Zheng, Jiaohao
    Kurt, Mehmet Necip
    Wang, Xiaodong
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 505 - 518