Efficient Online Reinforcement Learning with Offline Data

被引:0
|
作者
Ball, Philip J. [1 ]
Smith, Laura [2 ]
Kostrikov, Ilya [2 ]
Levine, Sergey [2 ]
机构
[1] Univ Oxford, Oxford, England
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
NEURAL-NETWORKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data, such as prior trajectories from a human expert or a sub-optimal exploration policy. Previous methods have relied on extensive modifications and additional complexity to ensure the effective use of this data. Instead, we ask: can we simply apply existing off-policy methods to leverage offline data when learning online? In this work, we demonstrate that the answer is yes; however, a set of minimal but important changes to existing off-policy RL algorithms are required to achieve reliable performance. We extensively ablate these design choices, demonstrating the key factors that most affect performance, and arrive at a set of recommendations that practitioners can readily apply, whether their data comprise a small number of expert demonstrations or large volumes of sub-optimal trajectories. We see that correct application of these simple recommendations can provide a 2.5x improvement over existing approaches across a diverse set of competitive benchmarks, with no additional computational overhead. We have released our code here: github.com/ikostrikov/rlpd.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Sample Efficient Offline-to-Online Reinforcement Learning
    Guo, Siyuan
    Zou, Lixin
    Chen, Hechang
    Qu, Bohao
    Chi, Haotian
    Yu, Philip S.
    Chang, Yi
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (03) : 1299 - 1310
  • [2] Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data
    Nie, Allen
    Flet-Berliac, Yannis
    Jordan, Deon R.
    Steenbergen, William
    Brunskill, Emma
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] Data-Efficient Offline Reinforcement Learning with Approximate Symmetries
    Angelotti, Giorgio
    Drougard, Nicolas
    Chanel, Caroline P. C.
    [J]. AGENTS AND ARTIFICIAL INTELLIGENCE, ICAART 2023, 2024, 14546 : 164 - 186
  • [4] Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
    Xie, Tengyang
    Jiang, Nan
    Wang, Huan
    Xiong, Caiming
    Bai, Yu
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [5] Offline Evaluation of Online Reinforcement Learning Algorithms
    Mandel, Travis
    Liu, Yun-En
    Brunskill, Emma
    Popovic, Zoran
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1926 - 1933
  • [6] Efficient Offline Reinforcement Learning With Relaxed Conservatism
    Huang, Longyang
    Dong, Botao
    Zhang, Weidong
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5260 - 5272
  • [7] Efficient Diffusion Policies for Offline Reinforcement Learning
    Kang, Bingyi
    Ma, Xiao
    Du, Chao
    Pang, Tianyu
    Yan, Shuicheng
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] Adaptive Policy Learning for Offline-to-Online Reinforcement Learning
    Zheng, Han
    Luo, Xufang
    Wei, Pengfei
    Song, Xuan
    Li, Dongsheng
    Jiang, Jing
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 11372 - 11380
  • [9] Online and Offline Reinforcement Learning by Planning with a Learned Model
    Schrittwieser, Julian
    Hubert, Thomas
    Mandhane, Amol
    Barekatain, Mohammadamin
    Antonoglou, Ioannis
    Silver, David
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [10] Robust Reinforcement Learning using Offline Data
    Panaganti, Kishan
    Xu, Zaiyan
    Kalathil, Dileep
    Ghavamzadeh, Mohammad
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,