BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning

被引:1
|
作者
Yang, Yijun [1 ,2 ]
Jiang, Jing [1 ]
Wang, Zhuowei [1 ]
Duan, Qiqi [2 ]
Shi, Yuhui [2 ]
机构
[1] Univ Technol Sydney, AAII, Ultimo, NSW 2007, Australia
[2] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen 518055, Peoples R China
关键词
Offline reinforcement learning; Multi-objective optimization; Evolution strategy;
D O I
10.1007/978-3-030-97546-3_46
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline reinforcement learning (RL) aims to train an agent solely using a dataset of historical interactions with the environments without any further costly or dangerous active exploration. Model-based RL (MbRL) usually achieves promising performance in offline RL due to its high sample-efficiency and compact modeling of a dynamic environment. However, it may suffer from the bias and error accumulation of the model predictions. Existing methods address this problem by adding a penalty term to the model reward but require careful hand-tuning of the penalty and its weight. Instead in this paper, we formulate the model-based offline RL as a bi-objective optimization where the first objective aims to maximize the model return and the second objective is adaptive to the learning dynamics of the RL policy. Thereby, we do not need to tune the penalty and its weight but can achieve a more advantageous trade-off between the final model return and model's uncertainty. We develop an efficient and adaptive policy optimization algorithm equipped with evolution strategy to solve the bi-objective optimization, named as BiES. The experimental results on a D4RL benchmark show that our approach sets the new state of the art and significantly outperforms existing offline RL methods on long-horizon tasks.
引用
收藏
页码:570 / 581
页数:12
相关论文
共 50 条
  • [1] Model-Based Offline Adaptive Policy Optimization with Episodic Memory
    Cao, Hongye
    Wei, Qianru
    Zheng, Jiangbin
    Shi, Yanqing
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT II, 2022, 13530 : 50 - 62
  • [2] Model-Based Offline Reinforcement Learning with Uncertainty Estimation and Policy Constraint
    Zhu J.
    Du C.
    Dullerud G.E.
    [J]. IEEE Transactions on Artificial Intelligence, 2024, 5 (12): : 1 - 13
  • [3] MOReL: Model-Based Offline Reinforcement Learning
    Kidambi, Rahul
    Rajeswaran, Aravind
    Netrapalli, Praneeth
    Joachims, Thorsten
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [4] Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning
    Yang, Shentao
    Feng, Yihao
    Zhang, Shujian
    Zhou, Mingyuan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [5] MOPO: Model-based Offline Policy Optimization
    Yu, Tianhe
    Thomas, Garrett
    Yu, Lantao
    Ermon, Stefano
    Zou, James
    Levine, Sergey
    Finn, Chelsea
    Ma, Tengyu
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [6] Model-Based Reinforcement Learning via Proximal Policy Optimization
    Sun, Yuewen
    Yuan, Xin
    Liu, Wenzhang
    Sun, Changyin
    [J]. 2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 4736 - 4740
  • [7] Offline Model-based Adaptable Policy Learning
    Chen, Xiong-Hui
    Yu, Yang
    Li, Qingyang
    Luo, Fan-Ming
    Qin, Zhiwei
    Shang, Wenjie
    Ye, Jieping
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [8] Model-Based Offline Reinforcement Learning with Local Misspecification
    Dong, Kefan
    Flet-Berliac, Yannis
    Nie, Allen
    Brunskill, Emma
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7423 - 7431
  • [9] Offline Reinforcement Learning with Reverse Model-based Imagination
    Wang, Jianhao
    Li, Wenzhe
    Jiang, Haozhe
    Zhu, Guangxiang
    Li, Siyuan
    Zhang, Chongjie
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [10] Offline Model-Based Reinforcement Learning for Tokamak Control
    Char, Ian
    Abbate, Joseph
    Bardoczi, Laszlo
    Boyer, Mark D.
    Chung, Youngseog
    Conlin, Rory
    Erickson, Keith
    Mehta, Viraj
    Richner, Nathan
    Kolemen, Egemen
    Schneider, Jeff
    [J]. LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211