On the sample complexity of actor-critic method for reinforcement learning with function approximation

被引:0
|
作者
Harshat Kumar
Alec Koppel
Alejandro Ribeiro
机构
[1] The University of Pennsylvania,Department of Electrical and Systems Engineering
[2] JPMorgan AI Research,undefined
来源
Machine Learning | 2023年 / 112卷
关键词
Actor-critic; Reinforcement learning; Markov decision process; Non-convex optimization; Stochastic programming;
D O I
暂无
中图分类号
学科分类号
摘要
Reinforcement learning, mathematically described by Markov Decision Problems, may be approached either through dynamic programming or policy search. Actor-critic algorithms combine the merits of both approaches by alternating between steps to estimate the value function and policy gradient updates. Due to the fact that the updates exhibit correlated noise and biased gradient updates, only the asymptotic behavior of actor-critic is known by connecting its behavior to dynamical systems. This work puts forth a new variant of actor-critic that employs Monte Carlo rollouts during the policy search updates, which results in controllable bias that depends on the number of critic evaluations. As a result, we are able to provide for the first time the convergence rate of actor-critic algorithms when the policy search step employs policy gradient, agnostic to the choice of policy evaluation technique. In particular, we establish conditions under which the sample complexity is comparable to stochastic gradient method for non-convex problems or slower as a result of the critic estimation error, which is the main complexity bottleneck. These results hold in continuous state and action spaces with linear function approximation for the value function. We then specialize these conceptual results to the case where the critic is estimated by Temporal Difference, Gradient Temporal Difference, and Accelerated Gradient Temporal Difference. These learning rates are then corroborated on a navigation problem involving an obstacle and the pendulum problem which provide insight into the interplay between optimization and generalization in reinforcement learning.
引用
收藏
页码:2433 / 2467
页数:34
相关论文
共 50 条
  • [21] Reinforcement learning with actor-critic for knowledge graph reasoning
    Zhang, Linli
    Li, Dewei
    Xi, Yugeng
    Jia, Shuai
    SCIENCE CHINA-INFORMATION SCIENCES, 2020, 63 (06)
  • [22] Actor-critic reinforcement learning for bidding in bilateral negotiation
    Arslan, Furkan
    Aydogan, Reyhan
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2022, 30 (05) : 1695 - 1714
  • [23] Reinforcement learning with actor-critic for knowledge graph reasoning
    Linli Zhang
    Dewei Li
    Yugeng Xi
    Shuai Jia
    Science China Information Sciences, 2020, 63
  • [24] A Sandpile Model for Reliable Actor-Critic Reinforcement Learning
    Peng, Yiming
    Chen, Gang
    Zhang, Mengjie
    Pang, Shaoning
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 4014 - 4021
  • [25] Reinforcement learning with actor-critic for knowledge graph reasoning
    Linli ZHANG
    Dewei LI
    Yugeng XI
    Shuai JIA
    ScienceChina(InformationSciences), 2020, 63 (06) : 223 - 225
  • [26] Addressing Function Approximation Error in Actor-Critic Methods
    Fujimoto, Scott
    van Hoof, Herke
    Meger, David
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [27] Actor-Critic Reinforcement Learning for Control With Stability Guarantee
    Han, Minghao
    Zhang, Lixian
    Wang, Jun
    Pan, Wei
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04) : 6217 - 6224
  • [28] Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
    Wu, Yue
    Zhai, Shuangfei
    Srivastava, Nitish
    Susskind, Joshua
    Zhang, Jian
    Salakhutdinov, Ruslan
    Goh, Hanlin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [29] Deep Actor-Critic Reinforcement Learning for Anomaly Detection
    Zhong, Chen
    Gursoy, M. Cenk
    Velipasalar, Senem
    2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
  • [30] MARS: Malleable Actor-Critic Reinforcement Learning Scheduler
    Baheri, Betis
    Tronge, Jacob
    Fang, Bo
    Li, Ang
    Chaudhary, Vipin
    Guan, Qiang
    2022 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE, IPCCC, 2022,