MOReL: Model-Based Offline Reinforcement Learning

被引：0

作者：

Kidambi, Rahul ^{[1
]}

Rajeswaran, Aravind ^{[2
,3
]}

Netrapalli, Praneeth ^{[4
]}

Joachims, Thorsten ^{[1
]}

机构：

[1] Cornell Univ, Ithaca, NY 14850 USA

[2] Univ Washington, Seattle, WA 98195 USA

[3] Google Res, Brain Team, Mountain View, CA 94043 USA

[4] Microsoft Res, Bengaluru, India

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020 | 2020年 / 33卷

关键词：

ALGORITHM; LEVEL;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. The ability to train RL policies offline would greatly expand where RL can be applied, its data efficiency, and its experimental velocity. Prior work in offline RL has been confined almost exclusively to model-free RL approaches. In this work, we present MOReL, an algorithmic framework for model-based offline RL. This framework consists of two steps: (a) learning a pessimistic MDP (P-MDP) using the offline dataset; (b) learning a near-optimal policy in this P-MDP. The learned P-MDP has the property that for any policy, the performance in the real environment is approximately lower-bounded by the performance in the P-MDP. This enables it to serve as a good surrogate for purposes of policy evaluation and learning, and overcome common pitfalls of model-based RL like model exploitation. Theoretically, we show that MOReL enjoys strong performance guarantees for offline RL. Through experiments, we show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks. Moreover, the modular design of MOReL enables future advances in its components (e.g., in model learning, planning etc.) to directly translate into improvements for offline RL. Project webpage: https://sites.google.com/view/morel

引用

页数：14

共 50 条

[21] Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning
Hishinuma, Toru
Senda, Kei
[J]. IEEE ACCESS, 2023, 11 : 145579 - 145590
[22] Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity
Shi, Laixi
Chi, Yuejie
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[23] A survey on model-based reinforcement learning
Fan-Ming Luo
Tian Xu
Hang Lai
Xiong-Hui Chen
Weinan Zhang
Yang Yu
[J]. Science China Information Sciences, 2024, 67
[24] The ubiquity of model-based reinforcement learning
Doll, Bradley B.
Simon, Dylan A.
Daw, Nathaniel D.
[J]. CURRENT OPINION IN NEUROBIOLOGY, 2012, 22 (06) : 1075 - 1081
[25] Nonparametric model-based reinforcement learning
Atkeson, CG
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10, 1998, 10 : 1008 - 1014
[26] Model-based Reinforcement Learning: A Survey
Moerland, Thomas M.
Broekens, Joost
Plaat, Aske
Jonker, Catholijn M.
[J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2023, 16 (01): : 1 - 118
[27] A survey on model-based reinforcement learning
Fan-Ming LUO
Tian XU
Hang LAI
Xiong-Hui CHEN
Weinan ZHANG
Yang YU
[J]. Science China(Information Sciences), 2024, 67 (02) : 59 - 84
[28] Multiple model-based reinforcement learning
Doya, K
Samejima, K
Katagiri, K
Kawato, M
[J]. NEURAL COMPUTATION, 2002, 14 (06) : 1347 - 1369
[29] A survey on model-based reinforcement learning
Luo, Fan-Ming
Xu, Tian
Lai, Hang
Chen, Xiong-Hui
Zhang, Weinan
Yu, Yang
[J]. SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (02)
[30] Offline Reinforcement Learning Based on Prioritized Sampling Model
Gu Y.
Cheng Y.-H.
Wang X.-S.
[J]. Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (01): : 143 - 153

← 1 2 3 4 5 →