Online and Offline Reinforcement Learning by Planning with a Learned Model

被引:0
|
作者
Schrittwieser, Julian [1 ]
Hubert, Thomas [1 ]
Mandhane, Amol [1 ]
Barekatain, Mohammadamin [1 ]
Antonoglou, Ioannis [1 ,2 ]
Silver, David [1 ,2 ]
机构
[1] DeepMind, London, England
[2] UCL, London, England
关键词
GO; ENVIRONMENT; SHOGI; CHESS; LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning efficiently from small amounts of data has long been the focus of model-based reinforcement learning, both for the online case when interacting with the environment and the offline case when learning from a fixed dataset. However, to date no single unified algorithm has demonstrated state-of-the-art results in both settings. In this work, we describe the Reanalyse algorithm which uses model-based policy and value improvement operators to compute new improved training targets on existing data points, allowing efficient learning for data budgets varying by several orders of magnitude. We further show that Reanalyse can also be used to learn entirely from demonstrations without any environment interactions, as in the case of offline Reinforcement Learning (offline RL). Combining Reanalyse with the MuZero algorithm, we introduce MuZero Unplugged, a single unified algorithm for any data budget, including offline RL. In contrast to previous work, our algorithm does not require any special adaptations for the off-policy or offline RL settings. MuZero Unplugged sets new state-of-the-art results in the RL Unplugged offline RL benchmark as well as in the online RL benchmark of Atari in the standard 200 million frame setting.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] A maintenance planning framework using online and offline deep reinforcement learning
    Bukhsh, Zaharah A.
    Molegraaf, Hajo
    Jansen, Nils
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023,
  • [2] Efficient Online Reinforcement Learning with Offline Data
    Ball, Philip J.
    Smith, Laura
    Kostrikov, Ilya
    Levine, Sergey
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [3] Offline Evaluation of Online Reinforcement Learning Algorithms
    Mandel, Travis
    Liu, Yun-En
    Brunskill, Emma
    Popovic, Zoran
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1926 - 1933
  • [4] Adaptive Policy Learning for Offline-to-Online Reinforcement Learning
    Zheng, Han
    Luo, Xufang
    Wei, Pengfei
    Song, Xuan
    Li, Dongsheng
    Jiang, Jing
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 11372 - 11380
  • [5] Sample Efficient Offline-to-Online Reinforcement Learning
    Guo, Siyuan
    Zou, Lixin
    Chen, Hechang
    Qu, Bohao
    Chi, Haotian
    Yu, Philip S.
    Chang, Yi
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (03) : 1299 - 1310
  • [6] Path Planning for Construction Machines by Offline Reinforcement Learning
    Nakayama, Tatsuya
    Kashi, Haruki
    Uchimura, Yutaka
    [J]. IEEJ Transactions on Industry Applications, 2024, 144 (05) : 367 - 373
  • [7] Offline replay supports planning in human reinforcement learning
    Momennejad, Ida
    Otto, A. Ross
    Daw, Nathaniel D.
    Norman, Kenneth A.
    [J]. ELIFE, 2018, 7
  • [8] Learning Aerial Docking via Offline-to-Online Reinforcement Learning
    Tao, Yang
    Feng Yuting
    Yu, Yushu
    [J]. 2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024, 2024, : 305 - 309
  • [9] Offline Planning and Online Learning Under Recovering Rewards
    Simchi-Levi, David
    Zheng, Zeyu
    Zhu, Feng
    [J]. MANAGEMENT SCIENCE, 2024,
  • [10] RLSynC: Offline-Online Reinforcement Learning for Synthon Completion
    Baker, Frazier N.
    Chen, Ziqi
    Adu-Ampratwum, Daniel
    Ning, Xia
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (17) : 6723 - 6735