Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning

被引：0

作者：

Zhu, Hanlin ^{[1
]}

Rashidinejad, Paria ^{[1
]}

Jiao, Jiantao ^{[1
,2
]}

机构：

[1] Univ Calif Berkeley, EECS, Berkeley, CA 94720 USA

[2] Univ Calif Berkeley, Stat, Berkeley, CA 94720 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a new practical algorithm for offline reinforcement learning (RL) in complex environments with insufficient data coverage. Our algorithm combines the marginalized importance sampling framework with the actor-critic paradigm, where the critic returns evaluations of the actor (policy) that are pessimistic relative to the offline data and have a small average (importance-weighted) Bellman error. Compared to existing methods, our algorithm simultaneously offers a number of advantages: (1) It achieves the optimal statistical rate of 1/root N-where N is the size of offline dataset-in converging to the best policy covered in the offline dataset, even when combined with general function approximators. (2) It relies on a weaker average notion of policy coverage (compared to the l(infinity) single-policy concentrability) that exploits the structure of policy visitations. (3) It outperforms the data-collection behavior policy over a wide range of specific hyperparameters. We provide both theoretical analysis and experimental results to validate the effectiveness of our proposed algorithm. The code is available at https://github.com/zhuhl98/ACrab.

引用

页数：24

共 50 条

[41] An Actor-Critic Hierarchical Reinforcement Learning Model for Course Recommendation
Liang, Kun
Zhang, Guoqiang
Guo, Jinhui
Li, Wentao
[J]. ELECTRONICS, 2023, 12 (24)
[42] A Prioritized objective actor-critic method for deep reinforcement learning
Nguyen, Ngoc Duy
Nguyen, Thanh Thi
Vamplew, Peter
Dazeley, Richard
Nahavandi, Saeid
[J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (16): : 10335 - 10349
[43] Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation
Zhou, Ruida
Liu, Tao
Cheng, Min
Kalathil, Dileep
Kumar, P. R.
Tian, Chao
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[44] Evaluating Correctness of Reinforcement Learning based on Actor-Critic Algorithm
Kim, Youngjae
Hussain, Manzoor
Suh, Jae-Won
Hong, Jang-Eui
[J]. 2022 THIRTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN), 2022, : 320 - 325
[45] Swarm Reinforcement Learning Method Based on an Actor-Critic Method
Iima, Hitoshi
Kuroe, Yasuaki
[J]. SIMULATED EVOLUTION AND LEARNING, 2010, 6457 : 279 - 288
[46] Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space
Fan, Zhou
Su, Rui
Zhang, Weinan
Yu, Yong
[J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2279 - 2285
[47] Manipulator Motion Planning based on Actor-Critic Reinforcement Learning
Li, Qiang
Nie, Jun
Wang, Haixia
Lu, Xiao
Song, Shibin
[J]. 2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 4248 - 4254
[48] Actor-critic reinforcement learning for the feedback control of a swinging chain
Dengler, C.
Lohmann, B.
[J]. IFAC PAPERSONLINE, 2018, 51 (13): : 378 - 383
[49] A Prioritized objective actor-critic method for deep reinforcement learning
Ngoc Duy Nguyen
Thanh Thi Nguyen
Peter Vamplew
Richard Dazeley
Saeid Nahavandi
[J]. Neural Computing and Applications, 2021, 33 : 10335 - 10349
[50] Dual Variable Actor-Critic for Adaptive Safe Reinforcement Learning
Lee, Junseo
Heo, Jaeseok
Kim, Dohyeong
Lee, Gunmin
Oh, Songhwai
[J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 7568 - 7573

← 1 2 3 4 5 →