QFAE: Q-Function guided Action Exploration for offline deep reinforcement learning

被引：0

作者：

Pang, Teng ^{[1
]}

Wu, Guoqiang ^{[1
]}

Zhang, Yan ^{[1
]}

Wang, Bingzheng ^{[1
]}

Yin, Yilong ^{[1
]}

机构：

[1] Shandong Univ, Sch Software, Jinan 250101, Shandong, Peoples R China

来源：

PATTERN RECOGNITION | 2025年 / 158卷

关键词：

Deep reinforcement learning; Offline reinforcement learning; Policy constraints; Action exploration; D4RL;

D O I：

10.1016/j.patcog.2024.111032

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Offline reinforcement learning (RL) expects to get an optimal policy by utilizing offline data. During policy learning, one typical method often constrains the target policy by offline data to reduce extrapolation errors. However, it can impede the learning ability of the target policy when the provided data is suboptimal. To solve this issue, we analyze the impact of action exploration on policy learning, which implies that it can improve policy learning under a suitable action perturbation. Inspired by the theoretical analysis, we propose a simple yet effective method named Q-Function guided Action Exploration (QFAE), which solves offline RL by strengthening the exploration of behavior policy with constraint perturbation action. Moreover, it can be viewed as a plug-in-play framework that can be embedded into existing policy constraint methods to improve performance. Experimental results on the D4RL illustrate the effectiveness of our method embedded into existing approaches.

引用

页数：10

共 50 条

[41] Robust optimal control of the multi-input systems with unknown disturbance based on adaptive integral reinforcement learning Q-function
Lv, Yongfeng
Zhao, Jun
Li, Rong
Ren, Xuemei
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (06) : 4234 - 4251
[42] Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity
Shi, Laixi
Li, Gen
Wei, Yuting
Chen, Yuxin
Chi, Yuejie
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[43] ACL-QL: Adaptive Conservative Level in Q -Learning for Offline Reinforcement Learning
Wu, Kun
Zhao, Yinuo
Xu, Zhiyuan
Che, Zhengping
Yin, Chengxiang
Liu, Chi Harold
Qiu, Qinru
Feng, Feifei
Tang, Jian
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[44] PROJECTED STATE-ACTION BALANCING WEIGHTS FOR OFFLINE REINFORCEMENT LEARNING
Wang, Jiayi
Qi, Zhengling
Wong, Raymond K. W.
ANNALS OF STATISTICS, 2023, 51 (04): : 1639 - 1665
[45] Improve exploration in deep reinforcement learning for UAV path planning using state and action entropy
Lv, Hui
Chen, Yadong
Li, Shibo
Zhu, Baolong
Li, Min
MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (05)
[46] Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare
Tang, Shengpu
Makar, Maggie
Sjoding, Michael W.
Doshi-Velez, Finale
Wiens, Jenna
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[47] Safe Exploration of State and Action Spaces in Reinforcement Learning
Garcia, Javier
Fernandez, Fernando
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2012, 45 : 515 - 564
[48] Self-Regulating Action Exploration in Reinforcement Learning
Teng, Teck-Hou
Tan, Ah-Hwee
Tan, Yuan-Sin
PROCEEDINGS OF THE INTERNATIONAL NEURAL NETWORK SOCIETY WINTER CONFERENCE (INNS-WC2012), 2012, 13 : 18 - 30
[49] #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
Tang, Haoran
Houthooft, Rein
Foote, Davis
Stooke, Adam
Chen, Xi
Duan, Yan
Schulman, John
De Turck, Filip
Abbeel, Pieter
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[50] Offline Reinforcement Learning of Robotic Control Using Deep Kinematics and Dynamics
Li, Xiang
Shang, Weiwei
Cong, Shuang
IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2024, 29 (04) : 2428 - 2439

← 1 2 3 4 5 →