Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning

被引:0
|
作者
Zhang, Jing [1 ]
Zhang, Chi [2 ]
Wang, Wenjia [1 ,3 ]
Jing, Bing-Yi [4 ]
机构
[1] HKUST, Hong Kong, Peoples R China
[2] Kuaishou Technol, Beijing, Peoples R China
[3] HKUST GZ, Guangzhou, Peoples R China
[4] SUSTech, Shenzhen, Peoples R China
关键词
MAXIMUM-LIKELIHOOD-ESTIMATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the inability to interact with the environment, offline reinforcement learning (RL) methods face the challenge of estimating the Out-of-Distribution (OOD) points. Existing methods for addressing this issue either control policy to exclude the OOD action or make the Q function pessimistic. However, these methods can be overly conservative or fail to identify OOD areas accurately. To overcome this problem, we propose a Constrained Policy optimization with Explicit Behavior density (CPED) method that utilizes a flow-GAN model to explicitly estimate the density of behavior policy. By estimating the explicit density, CPED can accurately identify the safe region and enable optimization within the region, resulting in less conservative learning policies. We further provide theoretical results for both the flow-GAN estimator and performance guarantee for CPED by showing that CPED can find the optimal Q-function value. Empirically, CPED outperforms existing alternatives on various standard offline reinforcement learning tasks, yielding higher expected returns.
引用
下载
收藏
页数:15
相关论文
共 50 条
  • [31] Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning
    Tian, Chang
    Liu, An
    Huang, Guan
    Luo, Wu
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2022, 70 : 1609 - 1624
  • [32] Policy Optimization for Continuous Reinforcement Learning
    Zhao, Hanyang
    Tang, Wenpin
    Yao, David D.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [33] Offline Reinforcement Learning for Wireless Network Optimization with Mixture Datasets
    Yang K.
    Shi C.
    Shen C.
    Yang J.
    Yeh S.
    Sydir J.J.
    IEEE Transactions on Wireless Communications, 2024, 23 (10) : 1 - 1
  • [34] Combined Constraint on Behavior Cloning and Discriminator in Offline Reinforcement Learning
    Kidera, Shunya
    Shintani, Kosuke
    Tsuneda, Toi
    Yamane, Satoshi
    IEEE ACCESS, 2024, 12 : 19942 - 19951
  • [35] Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning
    Zhang, Siyuan
    Jiang, Nan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [36] Offline Reinforcement Learning with On-Policy Q-Function Regularization
    Shi, Laixi
    Dadashi, Robert
    Chi, Yuejie
    Castro, Pablo Samuel
    Geist, Matthieu
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT IV, 2023, 14172 : 455 - 471
  • [37] Offline Reinforcement Learning with Pseudometric Learning
    Dadashi, Robert
    Rezaeifar, Shideh
    Vieillard, Nino
    Hussenot, Leonard
    Pietquin, Olivier
    Geist, Matthieu
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [38] Offline Reinforcement Learning with Uncertainty Critic Regularization Based on Density Estimation
    Li, Chao
    Wu, Fengge
    Zhao, Junsuo
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [39] Constrained Reinforcement Learning for Dynamic Optimization under Uncertainty
    Petsagkourakis, P.
    Sandoval, I. O.
    Bradford, E.
    Zhang, D.
    del Rio-Chanona, E. A.
    IFAC PAPERSONLINE, 2020, 53 (02): : 11264 - 11270
  • [40] Benchmarking Offline Reinforcement Learning
    Tittaferrante, Andrew
    Yassine, Abdulsalam
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 259 - 263