Continuous Doubly Constrained Batch Reinforcement Learning

被引:0
|
作者
Fakoor, Rasool [1 ]
Mueller, Jonas [1 ]
Asadi, Kavosh [1 ]
Chaudhari, Pratik [1 ,2 ]
Smola, Alexander J. [1 ]
机构
[1] Amazon Web Serv, Seattle, WA 98109 USA
[2] Univ Penn, Philadelphia, PA 19104 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷
关键词
NEURAL-NETWORKS; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. This leads to particularly severe extrapolation when our candidate policies diverge from one that generated the data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates. Over a comprehensive set of 32 continuous-action batch RL benchmarks, our approach compares favorably to state-of-the-art methods, regardless of how the offline data were collected.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] On Sampling Efficiency Optimization in Constrained Reinforcement Learning
    Jia, Qing-Shan
    2024 IEEE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS, AIM 2024, 2024, : 966 - 971
  • [42] Constrained Policy Improvement for Efficient Reinforcement Learning
    Sarafian, Elad
    Tamar, Aviv
    Kraus, Sarit
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2863 - 2871
  • [43] Model-Based Reinforcement Learning in Continuous Environments Using Real-Time Constrained Optimization
    Andersson, Olov
    Heintz, Fredrik
    Doherty, Patrick
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2497 - 2503
  • [44] Constrained Optimal Fuel Consumption of HEV: A Constrained Reinforcement Learning Approach
    Yan, Shuchang
    Zhu, Futang
    Wu, Jinlong
    IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, 2025, 11 (01): : 2660 - 2673
  • [45] Constrained Dirichlet Distribution Policy: Guarantee Zero Constraint Violation Reinforcement Learning for Continuous Robotic Control
    Ma, Jianming
    Cao, Zhanxiang
    Gao, Yue
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (12): : 11690 - 11697
  • [46] Learning to soar: Resource-constrained exploration in reinforcement learning
    Chung, Jen Jen
    Lawrance, Nicholas R. J.
    Sukkarieh, Salah
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2015, 34 (02): : 158 - 172
  • [47] Constrained Q-Learning for Batch Process Optimization
    Pan, Elton
    Petsagkourakis, Panagiotis
    Mowbray, Max
    Zhang, Dongda
    del Rio-Chanona, Antonio
    IFAC PAPERSONLINE, 2021, 54 (03): : 492 - 497
  • [48] An improved reinforcement learning control strategy for batch processes
    Zhang, Peng
    Zhang, Jie
    Long, Yang
    Hu, Bingzhang
    2019 24TH INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN AUTOMATION AND ROBOTICS (MMAR), 2019, : 360 - 365
  • [49] Reinforcement learning in continuous time and space
    Doya, K
    NEURAL COMPUTATION, 2000, 12 (01) : 219 - 245
  • [50] The Curse of Passive Data Collection in Batch Reinforcement Learning
    Xiao, Chenjun
    Lee, Ilbin
    Dai, Bo
    Schuurmans, Dale
    Szepesvari, Csaba
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151