MOReL: Model-Based Offline Reinforcement Learning

被引:0
|
作者
Kidambi, Rahul [1 ]
Rajeswaran, Aravind [2 ,3 ]
Netrapalli, Praneeth [4 ]
Joachims, Thorsten [1 ]
机构
[1] Cornell Univ, Ithaca, NY 14850 USA
[2] Univ Washington, Seattle, WA 98195 USA
[3] Google Res, Brain Team, Mountain View, CA 94043 USA
[4] Microsoft Res, Bengaluru, India
关键词
ALGORITHM; LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. The ability to train RL policies offline would greatly expand where RL can be applied, its data efficiency, and its experimental velocity. Prior work in offline RL has been confined almost exclusively to model-free RL approaches. In this work, we present MOReL, an algorithmic framework for model-based offline RL. This framework consists of two steps: (a) learning a pessimistic MDP (P-MDP) using the offline dataset; (b) learning a near-optimal policy in this P-MDP. The learned P-MDP has the property that for any policy, the performance in the real environment is approximately lower-bounded by the performance in the P-MDP. This enables it to serve as a good surrogate for purposes of policy evaluation and learning, and overcome common pitfalls of model-based RL like model exploitation. Theoretically, we show that MOReL enjoys strong performance guarantees for offline RL. Through experiments, we show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks. Moreover, the modular design of MOReL enables future advances in its components (e.g., in model learning, planning etc.) to directly translate into improvements for offline RL. Project webpage: https://sites.google.com/view/morel
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning
    Hishinuma, Toru
    Senda, Kei
    [J]. IEEE ACCESS, 2023, 11 : 145579 - 145590
  • [22] Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity
    Shi, Laixi
    Chi, Yuejie
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [23] A survey on model-based reinforcement learning
    Fan-Ming Luo
    Tian Xu
    Hang Lai
    Xiong-Hui Chen
    Weinan Zhang
    Yang Yu
    [J]. Science China Information Sciences, 2024, 67
  • [24] The ubiquity of model-based reinforcement learning
    Doll, Bradley B.
    Simon, Dylan A.
    Daw, Nathaniel D.
    [J]. CURRENT OPINION IN NEUROBIOLOGY, 2012, 22 (06) : 1075 - 1081
  • [25] Nonparametric model-based reinforcement learning
    Atkeson, CG
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10, 1998, 10 : 1008 - 1014
  • [26] Model-based Reinforcement Learning: A Survey
    Moerland, Thomas M.
    Broekens, Joost
    Plaat, Aske
    Jonker, Catholijn M.
    [J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2023, 16 (01): : 1 - 118
  • [27] A survey on model-based reinforcement learning
    Fan-Ming LUO
    Tian XU
    Hang LAI
    Xiong-Hui CHEN
    Weinan ZHANG
    Yang YU
    [J]. Science China(Information Sciences), 2024, 67 (02) : 59 - 84
  • [28] Multiple model-based reinforcement learning
    Doya, K
    Samejima, K
    Katagiri, K
    Kawato, M
    [J]. NEURAL COMPUTATION, 2002, 14 (06) : 1347 - 1369
  • [29] A survey on model-based reinforcement learning
    Luo, Fan-Ming
    Xu, Tian
    Lai, Hang
    Chen, Xiong-Hui
    Zhang, Weinan
    Yu, Yang
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (02)
  • [30] Offline Reinforcement Learning Based on Prioritized Sampling Model
    Gu Y.
    Cheng Y.-H.
    Wang X.-S.
    [J]. Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (01): : 143 - 153