Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning

被引:0
|
作者
Zhang, Jing [1 ]
Zhang, Chi [2 ]
Wang, Wenjia [1 ,3 ]
Jing, Bing-Yi [4 ]
机构
[1] HKUST, Hong Kong, Peoples R China
[2] Kuaishou Technol, Beijing, Peoples R China
[3] HKUST GZ, Guangzhou, Peoples R China
[4] SUSTech, Shenzhen, Peoples R China
关键词
MAXIMUM-LIKELIHOOD-ESTIMATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the inability to interact with the environment, offline reinforcement learning (RL) methods face the challenge of estimating the Out-of-Distribution (OOD) points. Existing methods for addressing this issue either control policy to exclude the OOD action or make the Q function pessimistic. However, these methods can be overly conservative or fail to identify OOD areas accurately. To overcome this problem, we propose a Constrained Policy optimization with Explicit Behavior density (CPED) method that utilizes a flow-GAN model to explicitly estimate the density of behavior policy. By estimating the explicit density, CPED can accurately identify the safe region and enable optimization within the region, resulting in less conservative learning policies. We further provide theoretical results for both the flow-GAN estimator and performance guarantee for CPED by showing that CPED can find the optimal Q-function value. Empirically, CPED outperforms existing alternatives on various standard offline reinforcement learning tasks, yielding higher expected returns.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Implicit and Explicit Policy Constraints for Offline Reinforcement Learning
    Liu, Yang
    Hofert, Marius
    [J]. CAUSAL LEARNING AND REASONING, VOL 236, 2024, 236 : 499 - 513
  • [2] Supported Policy Optimization for Offline Reinforcement Learning
    Wu, Jialong
    Wu, Haixu
    Qiu, Zihan
    Wang, Jianmin
    Long, Mingsheng
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [3] Constrained Offline Policy Optimization
    Polosky, Nicholas
    da Silva, Bruno C.
    Fiterau, Madalina
    Jagannath, Jithin
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [4] Constrained Variational Policy Optimization for Safe Reinforcement Learning
    Liu, Zuxin
    Cen, Zhepeng
    Isenbaev, Vladislav
    Liu, Wei
    Wu, Zhiwei Steven
    Li, Bo
    Zhao, Ding
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [5] CVaR-Constrained Policy Optimization for Safe Reinforcement Learning
    Zhang, Qiyuan
    Leng, Shu
    Ma, Xiaoteng
    Liu, Qihan
    Wang, Xueqian
    Liang, Bin
    Liu, Yu
    Yang, Jun
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 12
  • [6] Learning Behavior of Offline Reinforcement Learning Agents
    Shukla, Indu
    Dozier, Haley. R.
    Henslee, Althea. C.
    [J]. ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS VI, 2024, 13051
  • [7] Doubly constrained offline reinforcement learning for learning path recommendation
    Yun, Yue
    Dai, Huan
    An, Rui
    Zhang, Yupei
    Shang, Xuequn
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [8] Doubly constrained offline reinforcement learning for learning path recommendation
    Yun, Yue
    Dai, Huan
    An, Rui
    Zhang, Yupei
    Shang, Xuequn
    [J]. Knowledge-Based Systems, 2024, 284
  • [9] Density Constrained Reinforcement Learning
    Qin, Zengyi
    Chen, Yuxiao
    Fan, Chuchu
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [10] BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning
    Yang, Yijun
    Jiang, Jing
    Wang, Zhuowei
    Duan, Qiqi
    Shi, Yuhui
    [J]. AI 2021: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13151 : 570 - 581