QFAE: Q-Function guided Action Exploration for offline deep reinforcement learning

被引:0
|
作者
Pang, Teng [1 ]
Wu, Guoqiang [1 ]
Zhang, Yan [1 ]
Wang, Bingzheng [1 ]
Yin, Yilong [1 ]
机构
[1] Shandong Univ, Sch Software, Jinan 250101, Shandong, Peoples R China
关键词
Deep reinforcement learning; Offline reinforcement learning; Policy constraints; Action exploration; D4RL;
D O I
10.1016/j.patcog.2024.111032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline reinforcement learning (RL) expects to get an optimal policy by utilizing offline data. During policy learning, one typical method often constrains the target policy by offline data to reduce extrapolation errors. However, it can impede the learning ability of the target policy when the provided data is suboptimal. To solve this issue, we analyze the impact of action exploration on policy learning, which implies that it can improve policy learning under a suitable action perturbation. Inspired by the theoretical analysis, we propose a simple yet effective method named Q-Function guided Action Exploration (QFAE), which solves offline RL by strengthening the exploration of behavior policy with constraint perturbation action. Moreover, it can be viewed as a plug-in-play framework that can be embedded into existing policy constraint methods to improve performance. Experimental results on the D4RL illustrate the effectiveness of our method embedded into existing approaches.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] A Further Exploration of Deep Multi-Agent Reinforcement Learning with Hybrid Action Space
    Hua, Hongzhi
    Zhao, Ruiwei
    Wen, Guixuan
    Wu, Kaigui
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VI, 2023, 14259 : 1 - 12
  • [32] Warfarin Dose Management Using Offline Deep Reinforcement Learning
    Ji, Hannah
    Gill, Matthew F.
    Draper, Evan W.
    Liedl, David A.
    Hodge, David O.
    Houghton, Damon E.
    Casanegra, Ana I.
    CIRCULATION, 2023, 148
  • [33] Offline Deep Reinforcement Learning for Dynamic Pricing of Consumer Credit
    Khraishi, Raad
    Okhrati, Ramin
    3RD ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2022, 2022, : 325 - 333
  • [34] Overcoming model bias for robust offline deep reinforcement learning
    Swazinna, Phillip
    Udluft, Steffen
    Runkler, Thomas
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 104
  • [35] Autonomous exploration through deep reinforcement learning
    Yan, Xiangda
    Huang, Jie
    He, Keyan
    Hong, Huajie
    Xu, Dasheng
    INDUSTRIAL ROBOT-THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH AND APPLICATION, 2023, 50 (05): : 793 - 803
  • [36] Guided Online Distillation: Promoting Safe Reinforcement Learning by Offline Demonstration
    Li, Jinning
    Liu, Xinyi
    Zhu, Banghua
    Jiao, Jiantao
    Tomizuka, Masayoshi
    Tang, Chen
    Zhan, Wei
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 7447 - 7454
  • [37] Deep Reinforcement Learning with Double Q-Learning
    van Hasselt, Hado
    Guez, Arthur
    Silver, David
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2094 - 2100
  • [38] Knowledge guided fuzzy deep reinforcement learning
    Qin, Peng
    Zhao, Tao
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 264
  • [39] Value Function Evaluation with Data Augmentation for Offline Reinforcement Learning
    Zhou, Xianwei
    Zhang, Chulue
    Lin, Yifan
    Yu, Songsen
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT II, ICIC 2024, 2024, 14863 : 432 - 442
  • [40] Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation
    Foster, Dylan J.
    Krishnamurthy, Akshay
    Simchi-Levi, David
    Xu, Yunzong
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178