Offline Reinforcement Learning as Anti-exploration

被引:0
|
作者
Rezaeifar, Shideh [1 ]
Dadashi, Robert [2 ]
Vieillard, Nino [2 ,3 ]
Hussenot, Leonard [2 ,4 ]
Bachem, Olivier [2 ]
Pietquin, Olivier [2 ]
Geist, Matthieu [2 ]
机构
[1] Univ Geneva, Geneva, Switzerland
[2] Google Res, Brain Team, Mountain View, CA USA
[3] Univ Lorraine, CNRS, INRIA, IECL, F-54000 Nancy, France
[4] Univ Lille, CNRS, INRIA, UMR 9189,CRIStAL, Villeneuve Dascq, France
关键词
ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, without interactions with the system. An agent in this setting should avoid selecting actions whose consequences cannot be predicted from the data. This is the converse of exploration in RL, which favors such actions. We thus take inspiration from the literature on bonus-based exploration to design a new offline RL agent. The core idea is to subtract a prediction-based exploration bonus from the reward, instead of adding it for exploration. This allows the policy to stay close to the support of the dataset, and practically extends some previous pessimism-based offline RL methods to a deep learning setting with arbitrary bonuses. We also connect this approach to a more common regularization of the learned policy towards the data. Instantiated with a bonus based on the prediction error of a variational autoencoder, we show that our simple agent is competitive with the state of the art on a set of continuous control locomotion and manipulation tasks.
引用
收藏
页码:8106 / 8114
页数:9
相关论文
共 50 条
  • [1] OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning
    Wu, Fan
    Zhang, Rui
    Yi, Qi
    Gao, Yunkai
    Guo, Jiaming
    Peng, Shaohui
    Lan, Siming
    Han, Husheng
    Pan, Yansong
    Yuan, Kaizhao
    Jin, Pengwei
    Chen, Ruizhi
    Chen, Yunji
    Li, Ling
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15897 - 15905
  • [2] Offline Reinforcement Learning with Pseudometric Learning
    Dadashi, Robert
    Rezaeifar, Shideh
    Vieillard, Nino
    Hussenot, Leonard
    Pietquin, Olivier
    Geist, Matthieu
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [3] Benchmarking Offline Reinforcement Learning
    Tittaferrante, Andrew
    Yassine, Abdulsalam
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 259 - 263
  • [4] Federated Offline Reinforcement Learning
    Zhou, Doudou
    Zhang, Yufeng
    Sonabend-W, Aaron
    Wang, Zhaoran
    Lu, Junwei
    Cai, Tianxi
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (548) : 3152 - 3163
  • [5] Distributed Offline Reinforcement Learning
    Heredia, Paulo
    George, Jemin
    Mou, Shaoshuai
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 4621 - 4626
  • [6] QFAE: Q-Function guided Action Exploration for offline deep reinforcement learning
    Pang, Teng
    Wu, Guoqiang
    Zhang, Yan
    Wang, Bingzheng
    Yin, Yilong
    PATTERN RECOGNITION, 2025, 158
  • [7] Learning Behavior of Offline Reinforcement Learning Agents
    Shukla, Indu
    Dozier, Haley. R.
    Henslee, Althea. C.
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS VI, 2024, 13051
  • [8] Conservative Offline Distributional Reinforcement Learning
    Ma, Yecheng Jason
    Jayaraman, Dinesh
    Bastani, Osbert
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [9] On Efficient Sampling in Offline Reinforcement Learning
    Jia, Qing-Shan
    2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, : 1 - 6
  • [10] Offline Reinforcement Learning with Differential Privacy
    Qiao, Dan
    Wang, Yu-Xiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,