Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning

被引:0
|
作者
Zhang, Jing [1 ]
Zhang, Chi [2 ]
Wang, Wenjia [1 ,3 ]
Jing, Bing-Yi [4 ]
机构
[1] HKUST, Hong Kong, Peoples R China
[2] Kuaishou Technol, Beijing, Peoples R China
[3] HKUST GZ, Guangzhou, Peoples R China
[4] SUSTech, Shenzhen, Peoples R China
关键词
MAXIMUM-LIKELIHOOD-ESTIMATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the inability to interact with the environment, offline reinforcement learning (RL) methods face the challenge of estimating the Out-of-Distribution (OOD) points. Existing methods for addressing this issue either control policy to exclude the OOD action or make the Q function pessimistic. However, these methods can be overly conservative or fail to identify OOD areas accurately. To overcome this problem, we propose a Constrained Policy optimization with Explicit Behavior density (CPED) method that utilizes a flow-GAN model to explicitly estimate the density of behavior policy. By estimating the explicit density, CPED can accurately identify the safe region and enable optimization within the region, resulting in less conservative learning policies. We further provide theoretical results for both the flow-GAN estimator and performance guarantee for CPED by showing that CPED can find the optimal Q-function value. Empirically, CPED outperforms existing alternatives on various standard offline reinforcement learning tasks, yielding higher expected returns.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Offline Reinforcement Learning With Reverse Diffusion Guide Policy
    Zhang, Jiazhi
    Cheng, Yuhu
    Cao, Shuo
    Wang, Xuesong
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, : 1 - 9
  • [22] Inverse Reinforcement Learning with Explicit Policy Estimates
    Sanghvi, Navyata
    Usami, Shinnosuke
    Sharma, Mohit
    Groeger, Joachim
    Kitani, Kris
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9472 - 9480
  • [23] Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization
    Wen, Lu
    Duan, Jingliang
    Li, Shengbo Eben
    Xu, Shaobing
    Peng, Huei
    2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2020,
  • [24] Advantage Constrained Proximal Policy Optimization in Multi-Agent Reinforcement Learning
    Li, Weifan
    Zhu, Yuanheng
    Zhao, Dongbin
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [25] LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning
    Chen, Xi
    Ghadirzadeh, Ali
    Yu, Tianhe
    Wang, Jianhao
    Gao, Yuan
    Li, Wenzhe
    Liang, Bin
    Finn, Chelsea
    Zhang, Chongjie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [26] Reward-free offline reinforcement learning: Optimizing behavior policy via action exploration
    Huang Z.
    Sun S.
    Zhao J.
    Knowledge-Based Systems, 2024, 299
  • [27] Constrained Policy Improvement for Efficient Reinforcement Learning
    Sarafian, Elad
    Tamar, Aviv
    Kraus, Sarit
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2863 - 2871
  • [28] A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning
    Hong, Kihyuk
    Li, Yuhang
    Tewari, Ambuj
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [29] Energy-Based Policy Constraint for Offline Reinforcement Learning
    Peng, Zhiyong
    Han, Changlin
    Liu, Yadong
    Zhou, Zongtan
    ARTIFICIAL INTELLIGENCE, CICAI 2023, PT II, 2024, 14474 : 335 - 346
  • [30] A Policy-Guided Imitation Approach for Offline Reinforcement Learning
    Xu, Haoran
    Jiang, Li
    Li, Jianxiong
    Zhan, Xianyuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,