A New Reinforcement Learning Algorithm Based on Counterfactual Experience Replay

被引:0
|
作者
Li Menglin [1 ]
Chen Jing [1 ]
Chen Shaofei [1 ]
Gao Wei [1 ]
机构
[1] Natl Univ Def Technol, Changsha 410005, Peoples R China
关键词
Reinforcement Learning; Experience Replay Mechanism; Sampling Mechanism;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A new algorithm based on SARSA is proposed to avoid the overestimation problem in traditional reinforcement learning. Different from traditional methods to overcome this problem, the new algorithm can alleviate overestimation without significantly increasing the algorithm complexity. At the same time, aiming to problems existing in traditional SARSA, such as the weak ability of active exploration and unsatisfactory convergent results, the structure of Experience Memory Replay(EMR) is creatively modified in this paper. The new algorithm proposed in this paper changes the traditional experience playback structure and creatively adds counterfactual experience, which is called DCER(Dynamic Counterfactual Experience Replay) combining on-policy and off-policy. The exploration performance of the algorithm is increased by adding different experiences to EMR from the actual action when sampling. The algorithm was applied in the Gym Cartpole environment and compared with the traditional algorithm in the same environment, proving that the improved algorithm improved the performance of SARSA. Finally, the feasibility of the algorithm in a multi-agent reinforcement learning environment is analyzed.
引用
收藏
页码:1994 / 2001
页数:8
相关论文
共 50 条
  • [41] Multi-Input Autonomous Driving Based on Deep Reinforcement Learning With Double Bias Experience Replay
    Cui, Jianping
    Yuan, Liang
    He, Li
    Xiao, Wendong
    Ran, Teng
    Zhang, Jianbo
    IEEE SENSORS JOURNAL, 2023, 23 (11) : 11253 - 11261
  • [42] Unveiling the Effects of Experience Replay on Deep Reinforcement Learning-based Power Allocation in Wireless Networks
    Kopic, Amna
    Perenda, Erma
    Gacanin, Haris
    2024 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, WCNC 2024, 2024,
  • [43] Exploring a Reinforcement Learning Agent with Improved Prioritized Experience Replay for a Confrontation Game
    Zhao, Tian
    2022 INTERNATIONAL CONFERENCE ON BIG DATA, INFORMATION AND COMPUTER NETWORK (BDICN 2022), 2022, : 373 - 381
  • [44] Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review
    Hu Z.-J.
    Gao X.-G.
    Wan K.-F.
    Zhang L.-T.
    Wang Q.-L.
    Neretin E.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (11): : 2237 - 2256
  • [45] Multi-agent collaborative path planning algorithm with reinforcement learning and combined prioritized experience replay in Internet of Things
    Liu, Ping
    Ma, Xiangyu
    Ding, Jie
    Gu, Chenyu
    COMPUTERS & ELECTRICAL ENGINEERING, 2024, 116
  • [46] Re-attentive experience replay in off-policy reinforcement learning
    Wei, Wei
    Wang, Da
    Li, Lin
    Liang, Jiye
    MACHINE LEARNING, 2024, 113 (05) : 2327 - 2349
  • [48] Deep reinforcement learning via good choice resampling experience replay memory
    Chen X.-L.
    Cao L.
    Li C.-X.
    Xu Z.-X.
    He M.
    Chen, Xi-Liang (383618393@qq.com), 2018, Northeast University (33): : 600 - 606
  • [49] Re-attentive experience replay in off-policy reinforcement learning
    Wei Wei
    Da Wang
    Lin Li
    Jiye Liang
    Machine Learning, 2024, 113 : 2327 - 2349
  • [50] The Effects of Memory Replay in Reinforcement Learning
    Liu, Ruishan
    Zou, James
    2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 478 - 485