BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning

被引:1
|
作者
Yang, Yijun [1 ,2 ]
Jiang, Jing [1 ]
Wang, Zhuowei [1 ]
Duan, Qiqi [2 ]
Shi, Yuhui [2 ]
机构
[1] Univ Technol Sydney, AAII, Ultimo, NSW 2007, Australia
[2] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen 518055, Peoples R China
关键词
Offline reinforcement learning; Multi-objective optimization; Evolution strategy;
D O I
10.1007/978-3-030-97546-3_46
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline reinforcement learning (RL) aims to train an agent solely using a dataset of historical interactions with the environments without any further costly or dangerous active exploration. Model-based RL (MbRL) usually achieves promising performance in offline RL due to its high sample-efficiency and compact modeling of a dynamic environment. However, it may suffer from the bias and error accumulation of the model predictions. Existing methods address this problem by adding a penalty term to the model reward but require careful hand-tuning of the penalty and its weight. Instead in this paper, we formulate the model-based offline RL as a bi-objective optimization where the first objective aims to maximize the model return and the second objective is adaptive to the learning dynamics of the RL policy. Thereby, we do not need to tune the penalty and its weight but can achieve a more advantageous trade-off between the final model return and model's uncertainty. We develop an efficient and adaptive policy optimization algorithm equipped with evolution strategy to solve the bi-objective optimization, named as BiES. The experimental results on a D4RL benchmark show that our approach sets the new state of the art and significantly outperforms existing offline RL methods on long-horizon tasks.
引用
收藏
页码:570 / 581
页数:12
相关论文
共 50 条
  • [41] Efficient hyperparameter optimization through model-based reinforcement learning
    Wu, Jia
    Chen, SenPeng
    Liu, XiYuan
    [J]. NEUROCOMPUTING, 2020, 409 : 381 - 393
  • [42] Model-Based Reinforcement Learning Method for Microgrid Optimization Scheduling
    Yao, Jinke
    Xu, Jiachen
    Zhang, Ning
    Guan, Yajuan
    [J]. SUSTAINABILITY, 2023, 15 (12)
  • [43] Deep Reinforcement Learning with Model-based Acceleration for Hyperparameter Optimization
    Chen, SenPeng
    Wu, Jia
    Chen, XiuYun
    [J]. 2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 170 - 177
  • [44] Model-based offline reinforcement learning framework for optimizing tunnel boring machine operation
    Cao, Yupeng
    Luo, Wei
    Xue, Yadong
    Lin, Weiren
    Zhang, Feng
    [J]. UNDERGROUND SPACE, 2024, 19 : 47 - 71
  • [45] Differentiable Physics Models for Real-world Offline Model-based Reinforcement Learning
    Lutter, Michael
    Silberbauer, Johannes
    Watson, Joe
    Peters, Jan
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 4163 - 4170
  • [46] Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning
    Zhang, Jing
    Zhang, Chi
    Wang, Wenjia
    Jing, Bing-Yi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [47] Adaptive Model-Based Reinforcement Learning for Fast-Charging Optimization of Lithium-Ion Batteries
    Hao, Yuhan
    Lu, Qiugang
    Wang, Xizhe
    Jiang, Benben
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (01) : 127 - 137
  • [48] Laboratory experiments of model-based reinforcement learning for adaptive optics control
    Nousiainen, Jalo
    Engler, Byron
    Kasper, Markus
    Rajani, Chang
    Helin, Tapio
    Heritier, Cédric T.
    Quanz, Sascha P.
    Glauser, Adrian M.
    [J]. Journal of Astronomical Telescopes, Instruments, and Systems, 2024, 10 (01)
  • [49] Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning
    Hishinuma, Toru
    Senda, Kei
    [J]. IEEE ACCESS, 2023, 11 : 145579 - 145590
  • [50] Energy-Based Policy Constraint for Offline Reinforcement Learning
    Peng, Zhiyong
    Han, Changlin
    Liu, Yadong
    Zhou, Zongtan
    [J]. ARTIFICIAL INTELLIGENCE, CICAI 2023, PT II, 2024, 14474 : 335 - 346