Continuous Doubly Constrained Batch Reinforcement Learning

被引：0

作者：

Fakoor, Rasool ^{[1
]}

Mueller, Jonas ^{[1
]}

Asadi, Kavosh ^{[1
]}

Chaudhari, Pratik ^{[1
,2
]}

Smola, Alexander J. ^{[1
]}

机构：

[1] Amazon Web Serv, Seattle, WA 98109 USA

[2] Univ Penn, Philadelphia, PA 19104 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

关键词：

NEURAL-NETWORKS; GO;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. This leads to particularly severe extrapolation when our candidate policies diverge from one that generated the data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates. Over a comprehensive set of 32 continuous-action batch RL benchmarks, our approach compares favorably to state-of-the-art methods, regardless of how the offline data were collected.

引用

页数：14

共 50 条

[41] On Sampling Efficiency Optimization in Constrained Reinforcement Learning
Jia, Qing-Shan
2024 IEEE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS, AIM 2024, 2024, : 966 - 971
[42] Constrained Policy Improvement for Efficient Reinforcement Learning
Sarafian, Elad
Tamar, Aviv
Kraus, Sarit
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2863 - 2871
[43] Model-Based Reinforcement Learning in Continuous Environments Using Real-Time Constrained Optimization
Andersson, Olov
Heintz, Fredrik
Doherty, Patrick
PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2497 - 2503
[44] Constrained Optimal Fuel Consumption of HEV: A Constrained Reinforcement Learning Approach
Yan, Shuchang
Zhu, Futang
Wu, Jinlong
IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, 2025, 11 (01): : 2660 - 2673
[45] Constrained Dirichlet Distribution Policy: Guarantee Zero Constraint Violation Reinforcement Learning for Continuous Robotic Control
Ma, Jianming
Cao, Zhanxiang
Gao, Yue
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (12): : 11690 - 11697
[46] Learning to soar: Resource-constrained exploration in reinforcement learning
Chung, Jen Jen
Lawrance, Nicholas R. J.
Sukkarieh, Salah
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2015, 34 (02): : 158 - 172
[47] Constrained Q-Learning for Batch Process Optimization
Pan, Elton
Petsagkourakis, Panagiotis
Mowbray, Max
Zhang, Dongda
del Rio-Chanona, Antonio
IFAC PAPERSONLINE, 2021, 54 (03): : 492 - 497
[48] An improved reinforcement learning control strategy for batch processes
Zhang, Peng
Zhang, Jie
Long, Yang
Hu, Bingzhang
2019 24TH INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN AUTOMATION AND ROBOTICS (MMAR), 2019, : 360 - 365
[49] Reinforcement learning in continuous time and space
Doya, K
NEURAL COMPUTATION, 2000, 12 (01) : 219 - 245
[50] The Curse of Passive Data Collection in Batch Reinforcement Learning
Xiao, Chenjun
Lee, Ilbin
Dai, Bo
Schuurmans, Dale
Szepesvari, Csaba
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151

← 1 2 3 4 5 →