Continuous Doubly Constrained Batch Reinforcement Learning

被引:0
|
作者
Fakoor, Rasool [1 ]
Mueller, Jonas [1 ]
Asadi, Kavosh [1 ]
Chaudhari, Pratik [1 ,2 ]
Smola, Alexander J. [1 ]
机构
[1] Amazon Web Serv, Seattle, WA 98109 USA
[2] Univ Penn, Philadelphia, PA 19104 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷
关键词
NEURAL-NETWORKS; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. This leads to particularly severe extrapolation when our candidate policies diverge from one that generated the data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates. Over a comprehensive set of 32 continuous-action batch RL benchmarks, our approach compares favorably to state-of-the-art methods, regardless of how the offline data were collected.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Doubly constrained offline reinforcement learning for learning path recommendation
    Yun, Yue
    Dai, Huan
    An, Rui
    Zhang, Yupei
    Shang, Xuequn
    KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [2] Doubly constrained offline reinforcement learning for learning path recommendation
    Yun, Yue
    Dai, Huan
    An, Rui
    Zhang, Yupei
    Shang, Xuequn
    Knowledge-Based Systems, 2024, 284
  • [3] Safe chance constrained reinforcement learning for batch process control
    Mowbray, M.
    Petsagkourakis, R.
    del Rio-Chanona, E. A.
    Zhang, D.
    COMPUTERS & CHEMICAL ENGINEERING, 2022, 157
  • [4] Improving the Performance of Batch-Constrained Reinforcement Learning in Continuous Action Domains via Generative Adversarial Networks
    Saglam, Baturay
    Dalmaz, Onat
    Gonc, Kaan
    Kozat, Suleyman S.
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [5] Safe batch constrained deep reinforcement learning with generative adversarial network
    Dong, Wenbo
    Liu, Shaofan
    Sun, Shiliang
    INFORMATION SCIENCES, 2023, 634 : 259 - 270
  • [6] Batch-Constrained Reinforcement Learning for Dynamic Distribution Network Reconfiguration
    Gao, Yuanqi
    Wang, Wei
    Shi, Jie
    Yu, Nanpeng
    IEEE TRANSACTIONS ON SMART GRID, 2020, 11 (06) : 5357 - 5369
  • [7] Offline constrained reinforcement learning for batch-to-batch optimization of cobalt oxalate synthesis process
    Jia, Runda
    Zhang, Mingchuan
    Zheng, Jun
    He, Dakuo
    Chu, Fei
    Li, Kang
    CHEMICAL ENGINEERING RESEARCH & DESIGN, 2024, 209 : 334 - 345
  • [8] Reinforcement learning in batch processes
    Wilson, JA
    Martinez, EC
    APPLICATION OF NEURAL NETWORKS AND OTHER LEARNING TECHNOLOGIES IN PROCESS ENGINEERING, 2001, : 269 - 286
  • [9] Safe Reinforcement Learning for Continuous Spaces through Lyapunov-Constrained Behavior
    Fjerdingen, Sigurd A.
    Kyrkjebo, Erik
    ELEVENTH SCANDINAVIAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (SCAI 2011), 2011, 227 : 70 - 79
  • [10] Constrained continuous-action reinforcement learning for supply chain inventory management
    Burtea, Radu
    Tsay, Calvin
    COMPUTERS & CHEMICAL ENGINEERING, 2024, 181