QFAE: Q-Function guided Action Exploration for offline deep reinforcement learning

被引:0
|
作者
Pang, Teng [1 ]
Wu, Guoqiang [1 ]
Zhang, Yan [1 ]
Wang, Bingzheng [1 ]
Yin, Yilong [1 ]
机构
[1] Shandong Univ, Sch Software, Jinan 250101, Shandong, Peoples R China
关键词
Deep reinforcement learning; Offline reinforcement learning; Policy constraints; Action exploration; D4RL;
D O I
10.1016/j.patcog.2024.111032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline reinforcement learning (RL) expects to get an optimal policy by utilizing offline data. During policy learning, one typical method often constrains the target policy by offline data to reduce extrapolation errors. However, it can impede the learning ability of the target policy when the provided data is suboptimal. To solve this issue, we analyze the impact of action exploration on policy learning, which implies that it can improve policy learning under a suitable action perturbation. Inspired by the theoretical analysis, we propose a simple yet effective method named Q-Function guided Action Exploration (QFAE), which solves offline RL by strengthening the exploration of behavior policy with constraint perturbation action. Moreover, it can be viewed as a plug-in-play framework that can be embedded into existing policy constraint methods to improve performance. Experimental results on the D4RL illustrate the effectiveness of our method embedded into existing approaches.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Robust optimal control of the multi-input systems with unknown disturbance based on adaptive integral reinforcement learning Q-function
    Lv, Yongfeng
    Zhao, Jun
    Li, Rong
    Ren, Xuemei
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (06) : 4234 - 4251
  • [42] Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity
    Shi, Laixi
    Li, Gen
    Wei, Yuting
    Chen, Yuxin
    Chi, Yuejie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [43] ACL-QL: Adaptive Conservative Level in Q -Learning for Offline Reinforcement Learning
    Wu, Kun
    Zhao, Yinuo
    Xu, Zhiyuan
    Che, Zhengping
    Yin, Chengxiang
    Liu, Chi Harold
    Qiu, Qinru
    Feng, Feifei
    Tang, Jian
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [44] PROJECTED STATE-ACTION BALANCING WEIGHTS FOR OFFLINE REINFORCEMENT LEARNING
    Wang, Jiayi
    Qi, Zhengling
    Wong, Raymond K. W.
    ANNALS OF STATISTICS, 2023, 51 (04): : 1639 - 1665
  • [45] Improve exploration in deep reinforcement learning for UAV path planning using state and action entropy
    Lv, Hui
    Chen, Yadong
    Li, Shibo
    Zhu, Baolong
    Li, Min
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (05)
  • [46] Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare
    Tang, Shengpu
    Makar, Maggie
    Sjoding, Michael W.
    Doshi-Velez, Finale
    Wiens, Jenna
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [47] Safe Exploration of State and Action Spaces in Reinforcement Learning
    Garcia, Javier
    Fernandez, Fernando
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2012, 45 : 515 - 564
  • [48] Self-Regulating Action Exploration in Reinforcement Learning
    Teng, Teck-Hou
    Tan, Ah-Hwee
    Tan, Yuan-Sin
    PROCEEDINGS OF THE INTERNATIONAL NEURAL NETWORK SOCIETY WINTER CONFERENCE (INNS-WC2012), 2012, 13 : 18 - 30
  • [49] #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
    Tang, Haoran
    Houthooft, Rein
    Foote, Davis
    Stooke, Adam
    Chen, Xi
    Duan, Yan
    Schulman, John
    De Turck, Filip
    Abbeel, Pieter
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [50] Offline Reinforcement Learning of Robotic Control Using Deep Kinematics and Dynamics
    Li, Xiang
    Shang, Weiwei
    Cong, Shuang
    IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2024, 29 (04) : 2428 - 2439