QFAE: Q-Function guided Action Exploration for offline deep reinforcement learning

被引:0
|
作者
Pang, Teng [1 ]
Wu, Guoqiang [1 ]
Zhang, Yan [1 ]
Wang, Bingzheng [1 ]
Yin, Yilong [1 ]
机构
[1] Shandong Univ, Sch Software, Jinan 250101, Shandong, Peoples R China
关键词
Deep reinforcement learning; Offline reinforcement learning; Policy constraints; Action exploration; D4RL;
D O I
10.1016/j.patcog.2024.111032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline reinforcement learning (RL) expects to get an optimal policy by utilizing offline data. During policy learning, one typical method often constrains the target policy by offline data to reduce extrapolation errors. However, it can impede the learning ability of the target policy when the provided data is suboptimal. To solve this issue, we analyze the impact of action exploration on policy learning, which implies that it can improve policy learning under a suitable action perturbation. Inspired by the theoretical analysis, we propose a simple yet effective method named Q-Function guided Action Exploration (QFAE), which solves offline RL by strengthening the exploration of behavior policy with constraint perturbation action. Moreover, it can be viewed as a plug-in-play framework that can be embedded into existing policy constraint methods to improve performance. Experimental results on the D4RL illustrate the effectiveness of our method embedded into existing approaches.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Offline Reinforcement Learning with On-Policy Q-Function Regularization
    Shi, Laixi
    Dadashi, Robert
    Chi, Yuejie
    Castro, Pablo Samuel
    Geist, Matthieu
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT IV, 2023, 14172 : 455 - 471
  • [2] Reinforcement learning via approximation of the Q-function
    Langlois, Marina
    Sloan, Robert H.
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2010, 22 (03) : 219 - 235
  • [3] Offline Reinforcement Learning as Anti-exploration
    Rezaeifar, Shideh
    Dadashi, Robert
    Vieillard, Nino
    Hussenot, Leonard
    Bachem, Olivier
    Pietquin, Olivier
    Geist, Matthieu
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8106 - 8114
  • [4] Learning Optimal Q-Function Using Deep Boltzmann Machine for Reliable Trading of Cryptocurrency
    Bu, Seok-Jun
    Cho, Sung-Bae
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2018, PT I, 2018, 11314 : 468 - 480
  • [5] Adaptable Conservative Q-Learning for Offline Reinforcement Learning
    Qiu, Lyn
    Li, Xu
    Liang, Lenghan
    Sun, Mingming
    Yan, Junchi
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 200 - 212
  • [6] Mildly Conservative Q-Learning for Offline Reinforcement Learning
    Lyu, Jiafei
    Ma, Xiaoteng
    Li, Xiu
    Lu, Zongqing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [7] Reward-free offline reinforcement learning: Optimizing behavior policy via action exploration
    Huang, Zhenbo
    Sun, Shiliang
    Zhao, Jing
    KNOWLEDGE-BASED SYSTEMS, 2024, 299
  • [8] State-novelty guided action persistence in deep reinforcement learning
    Hu, Jianshu
    Weng, Paul
    Ban, Yutong
    MACHINE LEARNING, 2025, 114 (02)
  • [9] Learning Q-Function Approximations for Hybrid Control Problems
    Menta, Sandeep
    Warrington, Joseph
    Lygeros, John
    Morari, Manfred
    IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 1364 - 1369
  • [10] Reinforcement Learning with an Ensemble of Binary Action Deep Q-Networks
    Hafiz A.M.
    Hassaballah M.
    Alqahtani A.
    Alsubai S.
    Hameed M.A.
    Computer Systems Science and Engineering, 2023, 46 (03): : 2651 - 2666