Continuous Doubly Constrained Batch Reinforcement Learning

被引:0
|
作者
Fakoor, Rasool [1 ]
Mueller, Jonas [1 ]
Asadi, Kavosh [1 ]
Chaudhari, Pratik [1 ,2 ]
Smola, Alexander J. [1 ]
机构
[1] Amazon Web Serv, Seattle, WA 98109 USA
[2] Univ Penn, Philadelphia, PA 19104 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷
关键词
NEURAL-NETWORKS; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. This leads to particularly severe extrapolation when our candidate policies diverge from one that generated the data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates. Over a comprehensive set of 32 continuous-action batch RL benchmarks, our approach compares favorably to state-of-the-art methods, regardless of how the offline data were collected.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Batch Prioritization in Multigoal Reinforcement Learning
    Vecchietti, Luiz Felipe
    Kim, Taeyoung
    Choi, Kyujin
    Hong, Junhee
    Har, Dongsoo
    IEEE ACCESS, 2020, 8 : 137449 - 137461
  • [22] Batch reinforcement learning with state importance
    Li, LH
    Bulitko, V
    Greiner, R
    MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 : 566 - 568
  • [23] Reinforcement learning for a class of continuous-time input constrained optimal control problems
    Yaghmaie, Farnaz Adib
    Braun, David J.
    AUTOMATICA, 2019, 99 : 221 - 227
  • [24] Embedding active learning in batch-to-batch optimization using reinforcement learning
    Byun, Ha-Eun
    Kim, Boeun
    Lee, Jay H.
    AUTOMATICA, 2023, 157
  • [25] A latent batch-constrained deep reinforcement learning approach for precision dosing clinical decision support
    Qiu, Xihe
    Tan, Xiaoyu
    Li, Qiong
    Chen, Shaotao
    Ru, Yajun
    Jin, Yaochu
    KNOWLEDGE-BASED SYSTEMS, 2022, 237
  • [26] A Doubly Robust Approach to Sparse Reinforcement Learning
    Kim, Wonyoung
    Iyengar, Garud
    Zeevi, Assaf
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [27] Anytime-Constrained Reinforcement Learning
    McMahan, Jeremy
    Zhu, Xiaojin
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [28] Evolving Constrained Reinforcement Learning Policy
    Hu, Chengpeng
    Pei, Jiyuan
    Liu, Jialin
    Yao, Xin
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [29] Two Steps Reinforcement Learning in Continuous Reinforcement Learning Tasks
    Lopez-Bueno, Ivan
    Garcia, Javier
    Fernandez, Fernando
    BIO-INSPIRED SYSTEMS: COMPUTATIONAL AND AMBIENT INTELLIGENCE, PT 1, 2009, 5517 : 577 - 584
  • [30] Quantile Constrained Reinforcement Learning: A Reinforcement Learning Framework Constraining Outage Probability
    Jung, Whiyoung
    Cho, Myungsik
    Park, Jongeui
    Sung, Youngchul
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,