Offline Reinforcement Learning as Anti-exploration

被引:0
|
作者
Rezaeifar, Shideh [1 ]
Dadashi, Robert [2 ]
Vieillard, Nino [2 ,3 ]
Hussenot, Leonard [2 ,4 ]
Bachem, Olivier [2 ]
Pietquin, Olivier [2 ]
Geist, Matthieu [2 ]
机构
[1] Univ Geneva, Geneva, Switzerland
[2] Google Res, Brain Team, Mountain View, CA USA
[3] Univ Lorraine, CNRS, INRIA, IECL, F-54000 Nancy, France
[4] Univ Lille, CNRS, INRIA, UMR 9189,CRIStAL, Villeneuve Dascq, France
关键词
ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, without interactions with the system. An agent in this setting should avoid selecting actions whose consequences cannot be predicted from the data. This is the converse of exploration in RL, which favors such actions. We thus take inspiration from the literature on bonus-based exploration to design a new offline RL agent. The core idea is to subtract a prediction-based exploration bonus from the reward, instead of adding it for exploration. This allows the policy to stay close to the support of the dataset, and practically extends some previous pessimism-based offline RL methods to a deep learning setting with arbitrary bonuses. We also connect this approach to a more common regularization of the learned policy towards the data. Instantiated with a bonus based on the prediction error of a variational autoencoder, we show that our simple agent is competitive with the state of the art on a set of continuous control locomotion and manipulation tasks.
引用
收藏
页码:8106 / 8114
页数:9
相关论文
共 50 条
  • [21] Offline reinforcement learning with task hierarchies
    Devin Schwab
    Soumya Ray
    Machine Learning, 2017, 106 : 1569 - 1598
  • [22] Offline Reinforcement Learning at Multiple Frequencies
    Burns, Kaylee
    Yu, Tianhe
    Finn, Chelsea
    Hausman, Karol
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 2041 - 2051
  • [23] Survival Instinct in Offline Reinforcement Learning
    Li, Anqi
    Misra, Dipendra
    Kolobov, Andrey
    Cheng, Ching-An
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [24] A DATASET PERSPECTIVE ON OFFLINE REINFORCEMENT LEARNING
    Schweighofer, Kajetan
    Radler, Andreas
    Dinu, Marius-Constantin
    Hofmarcher, Markus
    Patil, Vihang
    Bitto-Nemling, Angela
    Eghbal-zadeh, Hamid
    Hochreiter, Sepp
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199
  • [25] Offline Reinforcement Learning for Mobile Notifications
    Yuan, Yiping
    Muralidharan, Ajith
    Nandy, Preetam
    Cheng, Miao
    Prabhakar, Prakruthi
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3614 - 3623
  • [26] Reward-free offline reinforcement learning: Optimizing behavior policy via action exploration
    Huang, Zhenbo
    Sun, Shiliang
    Zhao, Jing
    KNOWLEDGE-BASED SYSTEMS, 2024, 299
  • [27] Learning to Influence Human Behavior with Offline Reinforcement Learning
    Hong, Joey
    Levine, Sergey
    Dragan, Anca
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [28] A Review of Offline Reinforcement Learning Based on Representation Learning
    Wang X.-S.
    Wang R.-R.
    Cheng Y.-H.
    Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (06): : 1104 - 1128
  • [29] Discrete Uncertainty Quantification For Offline Reinforcement Learning
    Perez, Jose Luis
    Corrochano, Javier
    Garcia, Javier
    Majadas, Ruben
    Ibanez-Llano, Cristina
    Perez, Sergio
    Fernandez, Fernando
    JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2023, 13 (04) : 273 - 287
  • [30] Supported Value Regularization for Offline Reinforcement Learning
    Mao, Yixiu
    Zhang, Hongchang
    Chen, Chen
    Xu, Yi
    Ji, Xiangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,