QFAE: Q-Function guided Action Exploration for offline deep reinforcement learning

被引：0

作者：

Pang, Teng ^{[1
]}

Wu, Guoqiang ^{[1
]}

Zhang, Yan ^{[1
]}

Wang, Bingzheng ^{[1
]}

Yin, Yilong ^{[1
]}

机构：

[1] Shandong Univ, Sch Software, Jinan 250101, Shandong, Peoples R China

来源：

PATTERN RECOGNITION | 2025年 / 158卷

关键词：

Deep reinforcement learning; Offline reinforcement learning; Policy constraints; Action exploration; D4RL;

D O I：

10.1016/j.patcog.2024.111032

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Offline reinforcement learning (RL) expects to get an optimal policy by utilizing offline data. During policy learning, one typical method often constrains the target policy by offline data to reduce extrapolation errors. However, it can impede the learning ability of the target policy when the provided data is suboptimal. To solve this issue, we analyze the impact of action exploration on policy learning, which implies that it can improve policy learning under a suitable action perturbation. Inspired by the theoretical analysis, we propose a simple yet effective method named Q-Function guided Action Exploration (QFAE), which solves offline RL by strengthening the exploration of behavior policy with constraint perturbation action. Moreover, it can be viewed as a plug-in-play framework that can be embedded into existing policy constraint methods to improve performance. Experimental results on the D4RL illustrate the effectiveness of our method embedded into existing approaches.

引用

页数：10

共 50 条

[21] Sampling Efficient Deep Reinforcement Learning Through Preference-Guided Stochastic Exploration
Huang, Wenhui
Zhang, Cong
Wu, Jingda
He, Xiangkun
Zhang, Jie
Lv, Chen
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (12) : 18553 - 18564
[22] Learning Pseudometric-based Action Representations for Offline Reinforcement Learning
Gu, Pengjie
Zhao, Mengchen
Chen, Chen
Li, Dong
Hao, Jianye
An, Bo
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[23] Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning
Luo, Jianlan
Dong, Perry
Wu, Jeffrey
Kumar, Aviral
Geng, Xinyang
Levine, Sergey
CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
[24] Model-free LQR design by Q-function learning
Farjadnasab, Milad
Babazadeh, Maryam
AUTOMATICA, 2022, 137
[25] Lagrangian Method for Q-Function Learning (with Applications to Machine Translation)
Huang Bojun
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[26] Understanding Deep Neural Function Approximation in Reinforcement Learning via ε-Greedy Exploration
Liu, Fanghui
Viano, Luca
Cevher, Volkan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[27] OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning
Wu, Fan
Zhang, Rui
Yi, Qi
Gao, Yunkai
Guo, Jiaming
Peng, Shaohui
Lan, Siming
Han, Husheng
Pan, Yansong
Yuan, Kaizhao
Jin, Pengwei
Chen, Ruizhi
Chen, Yunji
Li, Ling
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15897 - 15905
[28] Multi-Agent Exploration for Faster and Reliable Deep Q-Learning Convergence in Reinforcement Learning
Majumdar, Abhijit
Benavidez, Patrick
Jamshidi, Mo
2018 WORLD AUTOMATION CONGRESS (WAC), 2018, : 222 - 227
[29] Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics
Weissenbacher, Matthias
Sinha, Samarth
Garg, Animesh
Kawahara, Yoshinobu
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[30] Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
Shao, Jianzhun
Qu, Yun
Chen, Chen
Zhang, Hongchang
Ji, Xiangyang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →