MOReL: Model-Based Offline Reinforcement Learning

被引：0

作者：

Kidambi, Rahul ^{[1
]}

Rajeswaran, Aravind ^{[2
,3
]}

Netrapalli, Praneeth ^{[4
]}

Joachims, Thorsten ^{[1
]}

机构：

[1] Cornell Univ, Ithaca, NY 14850 USA

[2] Univ Washington, Seattle, WA 98195 USA

[3] Google Res, Brain Team, Mountain View, CA 94043 USA

[4] Microsoft Res, Bengaluru, India

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020 | 2020年 / 33卷

关键词：

ALGORITHM; LEVEL;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. The ability to train RL policies offline would greatly expand where RL can be applied, its data efficiency, and its experimental velocity. Prior work in offline RL has been confined almost exclusively to model-free RL approaches. In this work, we present MOReL, an algorithmic framework for model-based offline RL. This framework consists of two steps: (a) learning a pessimistic MDP (P-MDP) using the offline dataset; (b) learning a near-optimal policy in this P-MDP. The learned P-MDP has the property that for any policy, the performance in the real environment is approximately lower-bounded by the performance in the P-MDP. This enables it to serve as a good surrogate for purposes of policy evaluation and learning, and overcome common pitfalls of model-based RL like model exploitation. Theoretically, we show that MOReL enjoys strong performance guarantees for offline RL. Through experiments, we show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks. Moreover, the modular design of MOReL enables future advances in its components (e.g., in model learning, planning etc.) to directly translate into improvements for offline RL. Project webpage: https://sites.google.com/view/morel

引用

页数：14

共 50 条

[1] Model-Based Offline Reinforcement Learning with Local Misspecification
Dong, Kefan
Flet-Berliac, Yannis
Nie, Allen
Brunskill, Emma
[J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7423 - 7431
[2] Offline Reinforcement Learning with Reverse Model-based Imagination
Wang, Jianhao
Li, Wenzhe
Jiang, Haozhe
Zhu, Guangxiang
Li, Siyuan
Zhang, Chongjie
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[3] Offline Model-Based Reinforcement Learning for Tokamak Control
Char, Ian
Abbate, Joseph
Bardoczi, Laszlo
Boyer, Mark D.
Chung, Youngseog
Conlin, Rory
Erickson, Keith
Mehta, Viraj
Richner, Nathan
Kolemen, Egemen
Schneider, Jeff
[J]. LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
[4] Weighted model estimation for offline model-based reinforcement learning
Hishinuma, Toru
Senda, Kei
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[5] SETTLING THE SAMPLE COMPLEXITY OF MODEL-BASED OFFLINE REINFORCEMENT LEARNING
Li, Gen
Shi, Laixi
Chen, Yuxin
Chi, Yuejie
Wei, Yuting
[J]. ANNALS OF STATISTICS, 2024, 52 (01): : 233 - 260
[6] Model-Based Offline Reinforcement Learning for Autonomous Delivery of Guidewire
Li, Hao
Zhou, Xiao-Hu
Xie, Xiao-Liang
Liu, Shi-Qi
Feng, Zhen-Qiu
Gui, Mei-Jiang
Xiang, Tian-Yu
Huang, De-Xing
Hou, Zeng-Guang
[J]. IEEE Transactions on Medical Robotics and Bionics, 2024, 6 (03): : 1054 - 1062
[7] Bayesian Model-Based Offline Reinforcement Learning for Product Allocation
Jenkins, Porter
Wei, Hua
Jenkins, J. Stockton
Li, Zhenhui
[J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 12531 - 12537
[8] Model-based offline reinforcement learning for sustainable fishery management
Ju, Jun
Kurniawati, Hanna
Kroese, Dirk
Ye, Nan
[J]. EXPERT SYSTEMS, 2023, 42 (01)
[9] OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning
Wu, Fan
Zhang, Rui
Yi, Qi
Gao, Yunkai
Guo, Jiaming
Peng, Shaohui
Lan, Siming
Han, Husheng
Pan, Yansong
Yuan, Kaizhao
Jin, Pengwei
Chen, Ruizhi
Chen, Yunji
Li, Ling
[J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15897 - 15905
[10] Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning
Swazinna, Phillip
Udluft, Steffen
Hein, Daniel
Runkler, Thomas
[J]. IFAC PAPERSONLINE, 2022, 55 (15): : 19 - 26

← 1 2 3 4 5 →