Continuous Doubly Constrained Batch Reinforcement Learning

被引:0
|
作者
Fakoor, Rasool [1 ]
Mueller, Jonas [1 ]
Asadi, Kavosh [1 ]
Chaudhari, Pratik [1 ,2 ]
Smola, Alexander J. [1 ]
机构
[1] Amazon Web Serv, Seattle, WA 98109 USA
[2] Univ Penn, Philadelphia, PA 19104 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷
关键词
NEURAL-NETWORKS; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. This leads to particularly severe extrapolation when our candidate policies diverge from one that generated the data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates. Over a comprehensive set of 32 continuous-action batch RL benchmarks, our approach compares favorably to state-of-the-art methods, regardless of how the offline data were collected.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Wafer batch device scheduling method combining reverse reinforcement learning and reinforcement learning
    Wang Z.
    Zhang P.
    Zhang J.
    Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2023, 29 (11): : 3738 - 3749
  • [32] Sampling strategies for batch mode reinforcement learning
    Stratégies d'échantillonnage pour l'apprentissage par renforcement batch
    2013, Lavoisier, 14 rue de Provigny, Cachan Cedex, F-94236, France (27)
  • [33] Policy Poisoning in Batch Reinforcement Learning and Control
    Ma, Yuzhe
    Zhang, Xuezhou
    Sun, Wen
    Zhu, Xiaojin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [34] Multi-task Batch Reinforcement Learning with Metric Learning
    Li, Jiachen
    Quan Vuong
    Liu, Shuang
    Liu, Minghua
    Ciosek, Kamil
    Christensen, Henrik
    Su, Hao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [35] Constrained Reinforcement Learning for Dynamic Material Handling
    Hu, Chengpeng
    Wang, Ziming
    Liu, Jialin
    Wen, Junyi
    Mao, Bifei
    Yao, Xin
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [36] Reinforcement learning for imitating constrained reaching movements
    Guenter, Florent
    Hersch, Micha
    Calinon, Sylvain
    Billard, Aude
    ADVANCED ROBOTICS, 2007, 21 (13) : 1521 - 1544
  • [37] Reinforcement Learning for Constrained Markov Decision Processes
    Gattami, Ather
    Bai, Qinbo
    Aggarwal, Vaneet
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [38] Safety-Constrained Reinforcement Learning for MDPs
    Junges, Sebastian
    Jansen, Nils
    Dehnert, Christian
    Topcu, Ufuk
    Katoen, Joost-Pieter
    TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS (TACAS 2016), 2016, 9636 : 130 - 146
  • [39] Constrained Reinforcement Learning in Hard Exploration Problems
    Pankayaraj, Pathmanathan
    Varakantham, Pradeep
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15055 - 15063
  • [40] Reinforcement learning for imitating constrained reaching movements
    LASA Laboratory, Ecole Polytechnique Federale de Lausanne, 1015 Lausanne, Switzerland
    Adv Rob, 2007, 13 (1521-1544):