Balanced prioritized experience replay in off-policy reinforcement learning

被引:0
|
作者
Zhouwei Lou
Yiye Wang
Shuo Shan
Kanjian Zhang
Haikun Wei
机构
[1] Ministry of Education,Key Laboratory of Measurement and Control of Complex Systems of Engineering
[2] Southeast University,School of Automation
关键词
Balanced prioritized experience replay (BPER); Reinforcement learning (RL); Experience imbalance; Experience rarity;
D O I
10.1007/s00521-024-09913-6
中图分类号
学科分类号
摘要
In Off-Policy reinforcement learning (RL), the experience imbalance problem can affect learning performance. The experience imbalance problem refers to the phenomenon that the experiences obtained by the agent during the learning process are unevenly distributed in the state space, resulting in the agent’s inability to accurately estimate the value of each potential state. This problem is typically caused by environments with high-dimensional state and action spaces, as well as the exploration–exploitation mechanism inherent in RL. This article proposes a balanced prioritized experience replay (BPER) algorithm based on experience rarity. First, an evaluation metric to quantify experience rarity is defined. Then, the sampling priority of each experience is calculated according to this metric. Finally, prioritized experience replay is performed according to the sampling priority. BPER increases the sampling frequency of high-rarity experiences and decreases the sampling frequency of low-rarity experiences, enabling the agent to learn more comprehensive knowledge. We evaluate BPER on a series of MuJoCo continuous control tasks. Experimental results show that BPER can effectively improve the learning performance while mitigating the impact of the experience imbalance problem.
引用
收藏
页码:15721 / 15737
页数:16
相关论文
共 50 条
  • [41] Rethinking Population-assisted Off-policy Reinforcement Learning
    Zheng, Bowen
    Cheng, Ran
    [J]. PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, GECCO 2023, 2023, : 624 - 632
  • [42] Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games
    Li, Jinna
    Modares, Hamidreza
    Chai, Tianyou
    Lewis, Frank L.
    Xie, Lihua
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (10) : 2434 - 2445
  • [43] Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning
    Shi, Wenjie
    Song, Shiji
    Wu, Hui
    Hsu, Ya-Chu
    Wu, Cheng
    Huang, Gao
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [44] Stabilizing Off-Policy Deep Reinforcement Learning from Pixels
    Cetin, Edoardo
    Ball, Philip J.
    Roberts, Steve
    Celiktutan, Oya
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [45] Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
    Kallus, Nathan
    Uehara, Masatoshi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [46] Off-policy evaluation for tabular reinforcement learning with synthetic trajectories
    Wang, Weiwei
    Li, Yuqiang
    Wu, Xianyi
    [J]. STATISTICS AND COMPUTING, 2024, 34 (01)
  • [47] Trajectory-Based Off-Policy Deep Reinforcement Learning
    Doerr, Andreas
    Volpp, Michael
    Toussaint, Marc
    Trimpe, Sebastian
    Daniel, Christian
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [48] Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
    Jiang, Nan
    Li, Lihong
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [49] Deep reinforcement learning based on balanced stratified prioritized experience replay for customer credit scoring in peer-to-peer lending
    Wang, Yadong
    Jia, Yanlin
    Fan, Sha
    Xiao, Jin
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (04)
  • [50] Quasi-Stochastic Approximation and Off-Policy Reinforcement Learning
    Bernstein, Andrey
    Chen, Yue
    Colombino, Marcello
    Dall'Anese, Emiliano
    Mehta, Prashant
    Meyn, Sean
    [J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5244 - 5251