Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning

被引:0
|
作者
Zhu, Hanlin [1 ]
Rashidinejad, Paria [1 ]
Jiao, Jiantao [1 ,2 ]
机构
[1] Univ Calif Berkeley, EECS, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Stat, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a new practical algorithm for offline reinforcement learning (RL) in complex environments with insufficient data coverage. Our algorithm combines the marginalized importance sampling framework with the actor-critic paradigm, where the critic returns evaluations of the actor (policy) that are pessimistic relative to the offline data and have a small average (importance-weighted) Bellman error. Compared to existing methods, our algorithm simultaneously offers a number of advantages: (1) It achieves the optimal statistical rate of 1/root N-where N is the size of offline dataset-in converging to the best policy covered in the offline dataset, even when combined with general function approximators. (2) It relies on a weaker average notion of policy coverage (compared to the l(infinity) single-policy concentrability) that exploits the structure of policy visitations. (3) It outperforms the data-collection behavior policy over a wide range of specific hyperparameters. We provide both theoretical analysis and experimental results to validate the effectiveness of our proposed algorithm. The code is available at https://github.com/zhuhl98/ACrab.
引用
收藏
页数:24
相关论文
共 50 条
  • [41] An Actor-Critic Hierarchical Reinforcement Learning Model for Course Recommendation
    Liang, Kun
    Zhang, Guoqiang
    Guo, Jinhui
    Li, Wentao
    [J]. ELECTRONICS, 2023, 12 (24)
  • [42] A Prioritized objective actor-critic method for deep reinforcement learning
    Nguyen, Ngoc Duy
    Nguyen, Thanh Thi
    Vamplew, Peter
    Dazeley, Richard
    Nahavandi, Saeid
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (16): : 10335 - 10349
  • [43] Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation
    Zhou, Ruida
    Liu, Tao
    Cheng, Min
    Kalathil, Dileep
    Kumar, P. R.
    Tian, Chao
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [44] Evaluating Correctness of Reinforcement Learning based on Actor-Critic Algorithm
    Kim, Youngjae
    Hussain, Manzoor
    Suh, Jae-Won
    Hong, Jang-Eui
    [J]. 2022 THIRTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN), 2022, : 320 - 325
  • [45] Swarm Reinforcement Learning Method Based on an Actor-Critic Method
    Iima, Hitoshi
    Kuroe, Yasuaki
    [J]. SIMULATED EVOLUTION AND LEARNING, 2010, 6457 : 279 - 288
  • [46] Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space
    Fan, Zhou
    Su, Rui
    Zhang, Weinan
    Yu, Yong
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2279 - 2285
  • [47] Manipulator Motion Planning based on Actor-Critic Reinforcement Learning
    Li, Qiang
    Nie, Jun
    Wang, Haixia
    Lu, Xiao
    Song, Shibin
    [J]. 2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 4248 - 4254
  • [48] Actor-critic reinforcement learning for the feedback control of a swinging chain
    Dengler, C.
    Lohmann, B.
    [J]. IFAC PAPERSONLINE, 2018, 51 (13): : 378 - 383
  • [49] A Prioritized objective actor-critic method for deep reinforcement learning
    Ngoc Duy Nguyen
    Thanh Thi Nguyen
    Peter Vamplew
    Richard Dazeley
    Saeid Nahavandi
    [J]. Neural Computing and Applications, 2021, 33 : 10335 - 10349
  • [50] Dual Variable Actor-Critic for Adaptive Safe Reinforcement Learning
    Lee, Junseo
    Heo, Jaeseok
    Kim, Dohyeong
    Lee, Gunmin
    Oh, Songhwai
    [J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 7568 - 7573