Model-Based Offline Reinforcement Learning with Local Misspecification

被引:0
|
作者
Dong, Kefan [1 ]
Flet-Berliac, Yannis [1 ]
Nie, Allen [1 ]
Brunskill, Emma [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a model-based offline reinforcement learning policy performance lower bound that explicitly captures dynamics model misspecification and distribution mismatch and we propose an empirical algorithm for optimal offline policy selection. Theoretically, we prove a novel safe policy improvement theorem by establishing pessimism approximations to the value function. Our key insight is to jointly consider selecting over dynamics models and policies: as long as a dynamics model can accurately represent the dynamics of the state-action pairs visited by a given policy, it is possible to approximate the value of that particular policy. We analyze our lower bound in the LQR setting and also show competitive performance to previous lower bounds on policy selection across a set of D4RL tasks.
引用
收藏
页码:7423 / 7431
页数:9
相关论文
共 50 条
  • [1] MOReL: Model-Based Offline Reinforcement Learning
    Kidambi, Rahul
    Rajeswaran, Aravind
    Netrapalli, Praneeth
    Joachims, Thorsten
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [2] Offline Reinforcement Learning with Reverse Model-based Imagination
    Wang, Jianhao
    Li, Wenzhe
    Jiang, Haozhe
    Zhu, Guangxiang
    Li, Siyuan
    Zhang, Chongjie
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] Offline Model-Based Reinforcement Learning for Tokamak Control
    Char, Ian
    Abbate, Joseph
    Bardoczi, Laszlo
    Boyer, Mark D.
    Chung, Youngseog
    Conlin, Rory
    Erickson, Keith
    Mehta, Viraj
    Richner, Nathan
    Kolemen, Egemen
    Schneider, Jeff
    [J]. LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [4] Weighted model estimation for offline model-based reinforcement learning
    Hishinuma, Toru
    Senda, Kei
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [5] SETTLING THE SAMPLE COMPLEXITY OF MODEL-BASED OFFLINE REINFORCEMENT LEARNING
    Li, Gen
    Shi, Laixi
    Chen, Yuxin
    Chi, Yuejie
    Wei, Yuting
    [J]. ANNALS OF STATISTICS, 2024, 52 (01): : 233 - 260
  • [6] Model-Based Offline Reinforcement Learning for Autonomous Delivery of Guidewire
    Li, Hao
    Zhou, Xiao-Hu
    Xie, Xiao-Liang
    Liu, Shi-Qi
    Feng, Zhen-Qiu
    Gui, Mei-Jiang
    Xiang, Tian-Yu
    Huang, De-Xing
    Hou, Zeng-Guang
    [J]. IEEE Transactions on Medical Robotics and Bionics, 2024, 6 (03): : 1054 - 1062
  • [7] Bayesian Model-Based Offline Reinforcement Learning for Product Allocation
    Jenkins, Porter
    Wei, Hua
    Jenkins, J. Stockton
    Li, Zhenhui
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 12531 - 12537
  • [8] Model-based offline reinforcement learning for sustainable fishery management
    Ju, Jun
    Kurniawati, Hanna
    Kroese, Dirk
    Ye, Nan
    [J]. EXPERT SYSTEMS, 2023, 42 (01)
  • [9] OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning
    Wu, Fan
    Zhang, Rui
    Yi, Qi
    Gao, Yunkai
    Guo, Jiaming
    Peng, Shaohui
    Lan, Siming
    Han, Husheng
    Pan, Yansong
    Yuan, Kaizhao
    Jin, Pengwei
    Chen, Ruizhi
    Chen, Yunji
    Li, Ling
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15897 - 15905
  • [10] Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning
    Swazinna, Phillip
    Udluft, Steffen
    Hein, Daniel
    Runkler, Thomas
    [J]. IFAC PAPERSONLINE, 2022, 55 (15): : 19 - 26