Balanced prioritized experience replay in off-policy reinforcement learning

被引:0
|
作者
Zhouwei Lou
Yiye Wang
Shuo Shan
Kanjian Zhang
Haikun Wei
机构
[1] Ministry of Education,Key Laboratory of Measurement and Control of Complex Systems of Engineering
[2] Southeast University,School of Automation
关键词
Balanced prioritized experience replay (BPER); Reinforcement learning (RL); Experience imbalance; Experience rarity;
D O I
10.1007/s00521-024-09913-6
中图分类号
学科分类号
摘要
In Off-Policy reinforcement learning (RL), the experience imbalance problem can affect learning performance. The experience imbalance problem refers to the phenomenon that the experiences obtained by the agent during the learning process are unevenly distributed in the state space, resulting in the agent’s inability to accurately estimate the value of each potential state. This problem is typically caused by environments with high-dimensional state and action spaces, as well as the exploration–exploitation mechanism inherent in RL. This article proposes a balanced prioritized experience replay (BPER) algorithm based on experience rarity. First, an evaluation metric to quantify experience rarity is defined. Then, the sampling priority of each experience is calculated according to this metric. Finally, prioritized experience replay is performed according to the sampling priority. BPER increases the sampling frequency of high-rarity experiences and decreases the sampling frequency of low-rarity experiences, enabling the agent to learn more comprehensive knowledge. We evaluate BPER on a series of MuJoCo continuous control tasks. Experimental results show that BPER can effectively improve the learning performance while mitigating the impact of the experience imbalance problem.
引用
收藏
页码:15721 / 15737
页数:16
相关论文
共 50 条
  • [1] High-Value Prioritized Experience Replay for Off-policy Reinforcement Learning
    Cao, Xi
    Wan, Huaiyu
    Lin, Youfang
    Han, Sheng
    [J]. 2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1510 - 1514
  • [2] Regret Minimization Experience Replay in Off-Policy Reinforcement Learning
    Liu, Xu-Hui
    Xue, Zhenghai
    Pang, Jing-Cheng
    Jiang, Shengyi
    Xu, Feng
    Yu, Yang
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] Enhanced Off-Policy Reinforcement Learning With Focused Experience Replay
    Kong, Seung-Hyun
    Nahrendra, I. Made Aswin
    Paek, Dong-Hee
    [J]. IEEE ACCESS, 2021, 9 : 93152 - 93164
  • [4] Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review
    Hu, Zi-Jian
    Gao, Xiao-Guang
    Wan, Kai-Fang
    Zhang, Le-Tian
    Wang, Qiang-Long
    Neretin, Evgeny
    [J]. Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (11): : 2237 - 2256
  • [5] Re-attentive experience replay in off-policy reinforcement learning
    Wei Wei
    Da Wang
    Lin Li
    Jiye Liang
    [J]. Machine Learning, 2024, 113 : 2327 - 2349
  • [6] Re-attentive experience replay in off-policy reinforcement learning
    Wei, Wei
    Wang, Da
    Li, Lin
    Liang, Jiye
    [J]. MACHINE LEARNING, 2024, 113 (05) : 2327 - 2349
  • [7] HiER: Highlight Experience Replay for Boosting Off-Policy Reinforcement Learning Agents
    Horvath, Daniel
    Martin, Jesus Bujalance
    Erdos, Ferenc Gabor
    Istenes, Zoltan
    Moutarde, Fabien
    [J]. IEEE ACCESS, 2024, 12 : 100102 - 100119
  • [8] Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms via Batch Prioritized Experience Replay
    Cicek, Dogan C.
    Duran, Enes
    Saglam, Baturay
    Mutlu, Furkan B.
    Kozat, Suleyman S.
    [J]. 2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 1255 - 1262
  • [9] Mixed experience sampling for off-policy reinforcement learning
    Yu, Jiayu
    Li, Jingyao
    Lu, Shuai
    Han, Shuai
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 251
  • [10] Safe and efficient off-policy reinforcement learning
    Munos, Remi
    Stepleton, Thomas
    Harutyunyan, Anna
    Bellemare, Marc G.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29