Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning

被引：0

作者：

Zhang, Jing ^{[1
]}

Zhang, Chi ^{[2
]}

Wang, Wenjia ^{[1
,3
]}

Jing, Bing-Yi ^{[4
]}

机构：

[1] HKUST, Hong Kong, Peoples R China

[2] Kuaishou Technol, Beijing, Peoples R China

[3] HKUST GZ, Guangzhou, Peoples R China

[4] SUSTech, Shenzhen, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

MAXIMUM-LIKELIHOOD-ESTIMATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Due to the inability to interact with the environment, offline reinforcement learning (RL) methods face the challenge of estimating the Out-of-Distribution (OOD) points. Existing methods for addressing this issue either control policy to exclude the OOD action or make the Q function pessimistic. However, these methods can be overly conservative or fail to identify OOD areas accurately. To overcome this problem, we propose a Constrained Policy optimization with Explicit Behavior density (CPED) method that utilizes a flow-GAN model to explicitly estimate the density of behavior policy. By estimating the explicit density, CPED can accurately identify the safe region and enable optimization within the region, resulting in less conservative learning policies. We further provide theoretical results for both the flow-GAN estimator and performance guarantee for CPED by showing that CPED can find the optimal Q-function value. Empirically, CPED outperforms existing alternatives on various standard offline reinforcement learning tasks, yielding higher expected returns.

引用

页数：15

共 50 条

[1] Implicit and Explicit Policy Constraints for Offline Reinforcement Learning
Liu, Yang
Hofert, Marius
[J]. CAUSAL LEARNING AND REASONING, VOL 236, 2024, 236 : 499 - 513
[2] Supported Policy Optimization for Offline Reinforcement Learning
Wu, Jialong
Wu, Haixu
Qiu, Zihan
Wang, Jianmin
Long, Mingsheng
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[3] Constrained Offline Policy Optimization
Polosky, Nicholas
da Silva, Bruno C.
Fiterau, Madalina
Jagannath, Jithin
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[4] Constrained Variational Policy Optimization for Safe Reinforcement Learning
Liu, Zuxin
Cen, Zhepeng
Isenbaev, Vladislav
Liu, Wei
Wu, Zhiwei Steven
Li, Bo
Zhao, Ding
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[5] CVaR-Constrained Policy Optimization for Safe Reinforcement Learning
Zhang, Qiyuan
Leng, Shu
Ma, Xiaoteng
Liu, Qihan
Wang, Xueqian
Liang, Bin
Liu, Yu
Yang, Jun
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 12
[6] Learning Behavior of Offline Reinforcement Learning Agents
Shukla, Indu
Dozier, Haley. R.
Henslee, Althea. C.
[J]. ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS VI, 2024, 13051
[7] Doubly constrained offline reinforcement learning for learning path recommendation
Yun, Yue
Dai, Huan
An, Rui
Zhang, Yupei
Shang, Xuequn
[J]. KNOWLEDGE-BASED SYSTEMS, 2024, 284
[8] Doubly constrained offline reinforcement learning for learning path recommendation
Yun, Yue
Dai, Huan
An, Rui
Zhang, Yupei
Shang, Xuequn
[J]. Knowledge-Based Systems, 2024, 284
[9] Density Constrained Reinforcement Learning
Qin, Zengyi
Chen, Yuxiao
Fan, Chuchu
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[10] BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning
Yang, Yijun
Jiang, Jing
Wang, Zhuowei
Duan, Qiqi
Shi, Yuhui
[J]. AI 2021: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13151 : 570 - 581

← 1 2 3 4 5 →