MOReL: Model-Based Offline Reinforcement Learning

被引:0
|
作者
Kidambi, Rahul [1 ]
Rajeswaran, Aravind [2 ,3 ]
Netrapalli, Praneeth [4 ]
Joachims, Thorsten [1 ]
机构
[1] Cornell Univ, Ithaca, NY 14850 USA
[2] Univ Washington, Seattle, WA 98195 USA
[3] Google Res, Brain Team, Mountain View, CA 94043 USA
[4] Microsoft Res, Bengaluru, India
关键词
ALGORITHM; LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. The ability to train RL policies offline would greatly expand where RL can be applied, its data efficiency, and its experimental velocity. Prior work in offline RL has been confined almost exclusively to model-free RL approaches. In this work, we present MOReL, an algorithmic framework for model-based offline RL. This framework consists of two steps: (a) learning a pessimistic MDP (P-MDP) using the offline dataset; (b) learning a near-optimal policy in this P-MDP. The learned P-MDP has the property that for any policy, the performance in the real environment is approximately lower-bounded by the performance in the P-MDP. This enables it to serve as a good surrogate for purposes of policy evaluation and learning, and overcome common pitfalls of model-based RL like model exploitation. Theoretically, we show that MOReL enjoys strong performance guarantees for offline RL. Through experiments, we show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks. Moreover, the modular design of MOReL enables future advances in its components (e.g., in model learning, planning etc.) to directly translate into improvements for offline RL. Project webpage: https://sites.google.com/view/morel
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Model-Based Offline Reinforcement Learning with Local Misspecification
    Dong, Kefan
    Flet-Berliac, Yannis
    Nie, Allen
    Brunskill, Emma
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7423 - 7431
  • [2] Offline Reinforcement Learning with Reverse Model-based Imagination
    Wang, Jianhao
    Li, Wenzhe
    Jiang, Haozhe
    Zhu, Guangxiang
    Li, Siyuan
    Zhang, Chongjie
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] Offline Model-Based Reinforcement Learning for Tokamak Control
    Char, Ian
    Abbate, Joseph
    Bardoczi, Laszlo
    Boyer, Mark D.
    Chung, Youngseog
    Conlin, Rory
    Erickson, Keith
    Mehta, Viraj
    Richner, Nathan
    Kolemen, Egemen
    Schneider, Jeff
    [J]. LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [4] Weighted model estimation for offline model-based reinforcement learning
    Hishinuma, Toru
    Senda, Kei
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [5] SETTLING THE SAMPLE COMPLEXITY OF MODEL-BASED OFFLINE REINFORCEMENT LEARNING
    Li, Gen
    Shi, Laixi
    Chen, Yuxin
    Chi, Yuejie
    Wei, Yuting
    [J]. ANNALS OF STATISTICS, 2024, 52 (01): : 233 - 260
  • [6] Model-based offline reinforcement learning for sustainable fishery management
    Ju, Jun
    Kurniawati, Hanna
    Kroese, Dirk
    Ye, Nan
    [J]. EXPERT SYSTEMS, 2023, 42 (01)
  • [7] Model-Based Offline Reinforcement Learning for Autonomous Delivery of Guidewire
    [J]. Zhou, Xiao-Hu (xiaohu.zhou@ia.ac.cn); Hou, Zeng-Guang (zengguang.hou@ia.ac.cn), 2024, Institute of Electrical and Electronics Engineers Inc. (06):
  • [8] Bayesian Model-Based Offline Reinforcement Learning for Product Allocation
    Jenkins, Porter
    Wei, Hua
    Jenkins, J. Stockton
    Li, Zhenhui
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 12531 - 12537
  • [9] OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning
    Wu, Fan
    Zhang, Rui
    Yi, Qi
    Gao, Yunkai
    Guo, Jiaming
    Peng, Shaohui
    Lan, Siming
    Han, Husheng
    Pan, Yansong
    Yuan, Kaizhao
    Jin, Pengwei
    Chen, Ruizhi
    Chen, Yunji
    Li, Ling
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15897 - 15905
  • [10] Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning
    Swazinna, Phillip
    Udluft, Steffen
    Hein, Daniel
    Runkler, Thomas
    [J]. IFAC PAPERSONLINE, 2022, 55 (15): : 19 - 26