QFAE: Q-Function guided Action Exploration for offline deep reinforcement learning

被引:0
|
作者
Pang, Teng [1 ]
Wu, Guoqiang [1 ]
Zhang, Yan [1 ]
Wang, Bingzheng [1 ]
Yin, Yilong [1 ]
机构
[1] Shandong Univ, Sch Software, Jinan 250101, Shandong, Peoples R China
关键词
Deep reinforcement learning; Offline reinforcement learning; Policy constraints; Action exploration; D4RL;
D O I
10.1016/j.patcog.2024.111032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline reinforcement learning (RL) expects to get an optimal policy by utilizing offline data. During policy learning, one typical method often constrains the target policy by offline data to reduce extrapolation errors. However, it can impede the learning ability of the target policy when the provided data is suboptimal. To solve this issue, we analyze the impact of action exploration on policy learning, which implies that it can improve policy learning under a suitable action perturbation. Inspired by the theoretical analysis, we propose a simple yet effective method named Q-Function guided Action Exploration (QFAE), which solves offline RL by strengthening the exploration of behavior policy with constraint perturbation action. Moreover, it can be viewed as a plug-in-play framework that can be embedded into existing policy constraint methods to improve performance. Experimental results on the D4RL illustrate the effectiveness of our method embedded into existing approaches.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Sampling Efficient Deep Reinforcement Learning Through Preference-Guided Stochastic Exploration
    Huang, Wenhui
    Zhang, Cong
    Wu, Jingda
    He, Xiangkun
    Zhang, Jie
    Lv, Chen
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (12) : 18553 - 18564
  • [22] Learning Pseudometric-based Action Representations for Offline Reinforcement Learning
    Gu, Pengjie
    Zhao, Mengchen
    Chen, Chen
    Li, Dong
    Hao, Jianye
    An, Bo
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [23] Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning
    Luo, Jianlan
    Dong, Perry
    Wu, Jeffrey
    Kumar, Aviral
    Geng, Xinyang
    Levine, Sergey
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [24] Model-free LQR design by Q-function learning
    Farjadnasab, Milad
    Babazadeh, Maryam
    AUTOMATICA, 2022, 137
  • [25] Lagrangian Method for Q-Function Learning (with Applications to Machine Translation)
    Huang Bojun
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [26] Understanding Deep Neural Function Approximation in Reinforcement Learning via ε-Greedy Exploration
    Liu, Fanghui
    Viano, Luca
    Cevher, Volkan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [27] OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning
    Wu, Fan
    Zhang, Rui
    Yi, Qi
    Gao, Yunkai
    Guo, Jiaming
    Peng, Shaohui
    Lan, Siming
    Han, Husheng
    Pan, Yansong
    Yuan, Kaizhao
    Jin, Pengwei
    Chen, Ruizhi
    Chen, Yunji
    Li, Ling
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15897 - 15905
  • [28] Multi-Agent Exploration for Faster and Reliable Deep Q-Learning Convergence in Reinforcement Learning
    Majumdar, Abhijit
    Benavidez, Patrick
    Jamshidi, Mo
    2018 WORLD AUTOMATION CONGRESS (WAC), 2018, : 222 - 227
  • [29] Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics
    Weissenbacher, Matthias
    Sinha, Samarth
    Garg, Animesh
    Kawahara, Yoshinobu
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [30] Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
    Shao, Jianzhun
    Qu, Yun
    Chen, Chen
    Zhang, Hongchang
    Ji, Xiangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,