QFAE: Q-Function guided Action Exploration for offline deep reinforcement learning

被引：0

作者：

Pang, Teng ^{[1
]}

Wu, Guoqiang ^{[1
]}

Zhang, Yan ^{[1
]}

Wang, Bingzheng ^{[1
]}

Yin, Yilong ^{[1
]}

机构：

[1] Shandong Univ, Sch Software, Jinan 250101, Shandong, Peoples R China

来源：

PATTERN RECOGNITION | 2025年 / 158卷

关键词：

Deep reinforcement learning; Offline reinforcement learning; Policy constraints; Action exploration; D4RL;

D O I：

10.1016/j.patcog.2024.111032

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Offline reinforcement learning (RL) expects to get an optimal policy by utilizing offline data. During policy learning, one typical method often constrains the target policy by offline data to reduce extrapolation errors. However, it can impede the learning ability of the target policy when the provided data is suboptimal. To solve this issue, we analyze the impact of action exploration on policy learning, which implies that it can improve policy learning under a suitable action perturbation. Inspired by the theoretical analysis, we propose a simple yet effective method named Q-Function guided Action Exploration (QFAE), which solves offline RL by strengthening the exploration of behavior policy with constraint perturbation action. Moreover, it can be viewed as a plug-in-play framework that can be embedded into existing policy constraint methods to improve performance. Experimental results on the D4RL illustrate the effectiveness of our method embedded into existing approaches.

引用

页数：10

共 50 条

[1] Offline Reinforcement Learning with On-Policy Q-Function Regularization
Shi, Laixi
Dadashi, Robert
Chi, Yuejie
Castro, Pablo Samuel
Geist, Matthieu
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT IV, 2023, 14172 : 455 - 471
[2] Reinforcement learning via approximation of the Q-function
Langlois, Marina
Sloan, Robert H.
JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2010, 22 (03) : 219 - 235
[3] Offline Reinforcement Learning as Anti-exploration
Rezaeifar, Shideh
Dadashi, Robert
Vieillard, Nino
Hussenot, Leonard
Bachem, Olivier
Pietquin, Olivier
Geist, Matthieu
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8106 - 8114
[4] Learning Optimal Q-Function Using Deep Boltzmann Machine for Reliable Trading of Cryptocurrency
Bu, Seok-Jun
Cho, Sung-Bae
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2018, PT I, 2018, 11314 : 468 - 480
[5] Adaptable Conservative Q-Learning for Offline Reinforcement Learning
Qiu, Lyn
Li, Xu
Liang, Lenghan
Sun, Mingming
Yan, Junchi
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 200 - 212
[6] Mildly Conservative Q-Learning for Offline Reinforcement Learning
Lyu, Jiafei
Ma, Xiaoteng
Li, Xiu
Lu, Zongqing
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[7] Reward-free offline reinforcement learning: Optimizing behavior policy via action exploration
Huang, Zhenbo
Sun, Shiliang
Zhao, Jing
KNOWLEDGE-BASED SYSTEMS, 2024, 299
[8] State-novelty guided action persistence in deep reinforcement learning
Hu, Jianshu
Weng, Paul
Ban, Yutong
MACHINE LEARNING, 2025, 114 (02)
[9] Learning Q-Function Approximations for Hybrid Control Problems
Menta, Sandeep
Warrington, Joseph
Lygeros, John
Morari, Manfred
IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 1364 - 1369
[10] Reinforcement Learning with an Ensemble of Binary Action Deep Q-Networks
Hafiz A.M.
Hassaballah M.
Alqahtani A.
Alsubai S.
Hameed M.A.
Computer Systems Science and Engineering, 2023, 46 (03): : 2651 - 2666

← 1 2 3 4 5 →