Efficient Online Reinforcement Learning with Offline Data

被引：0

作者：

Ball, Philip J. ^{[1
]}

Smith, Laura ^{[2
]}

Kostrikov, Ilya ^{[2
]}

Levine, Sergey ^{[2
]}

机构：

[1] Univ Oxford, Oxford, England

[2] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202 | 2023年 / 202卷

关键词：

NEURAL-NETWORKS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data, such as prior trajectories from a human expert or a sub-optimal exploration policy. Previous methods have relied on extensive modifications and additional complexity to ensure the effective use of this data. Instead, we ask: can we simply apply existing off-policy methods to leverage offline data when learning online? In this work, we demonstrate that the answer is yes; however, a set of minimal but important changes to existing off-policy RL algorithms are required to achieve reliable performance. We extensively ablate these design choices, demonstrating the key factors that most affect performance, and arrive at a set of recommendations that practitioners can readily apply, whether their data comprise a small number of expert demonstrations or large volumes of sub-optimal trajectories. We see that correct application of these simple recommendations can provide a 2.5x improvement over existing approaches across a diverse set of competitive benchmarks, with no additional computational overhead. We have released our code here: github.com/ikostrikov/rlpd.

引用

页数：18

共 50 条

[1] Sample Efficient Offline-to-Online Reinforcement Learning
Guo, Siyuan
Zou, Lixin
Chen, Hechang
Qu, Bohao
Chi, Haotian
Yu, Philip S.
Chang, Yi
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (03) : 1299 - 1310
[2] Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data
Nie, Allen
Flet-Berliac, Yannis
Jordan, Deon R.
Steenbergen, William
Brunskill, Emma
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[3] Data-Efficient Offline Reinforcement Learning with Approximate Symmetries
Angelotti, Giorgio
Drougard, Nicolas
Chanel, Caroline P. C.
[J]. AGENTS AND ARTIFICIAL INTELLIGENCE, ICAART 2023, 2024, 14546 : 164 - 186
[4] Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
Xie, Tengyang
Jiang, Nan
Wang, Huan
Xiong, Caiming
Bai, Yu
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[5] Offline Evaluation of Online Reinforcement Learning Algorithms
Mandel, Travis
Liu, Yun-En
Brunskill, Emma
Popovic, Zoran
[J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1926 - 1933
[6] Efficient Offline Reinforcement Learning With Relaxed Conservatism
Huang, Longyang
Dong, Botao
Zhang, Weidong
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5260 - 5272
[7] Efficient Diffusion Policies for Offline Reinforcement Learning
Kang, Bingyi
Ma, Xiao
Du, Chao
Pang, Tianyu
Yan, Shuicheng
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[8] Adaptive Policy Learning for Offline-to-Online Reinforcement Learning
Zheng, Han
Luo, Xufang
Wei, Pengfei
Song, Xuan
Li, Dongsheng
Jiang, Jing
[J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 11372 - 11380
[9] Online and Offline Reinforcement Learning by Planning with a Learned Model
Schrittwieser, Julian
Hubert, Thomas
Mandhane, Amol
Barekatain, Mohammadamin
Antonoglou, Ioannis
Silver, David
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[10] Robust Reinforcement Learning using Offline Data
Panaganti, Kishan
Xu, Zaiyan
Kalathil, Dileep
Ghavamzadeh, Mohammad
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,

← 1 2 3 4 5 →