A New Reinforcement Learning Algorithm Based on Counterfactual Experience Replay

被引：0

作者：

Li Menglin ^{[1
]}

Chen Jing ^{[1
]}

Chen Shaofei ^{[1
]}

Gao Wei ^{[1
]}

机构：

[1] Natl Univ Def Technol, Changsha 410005, Peoples R China

来源：

PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE | 2020年

关键词：

Reinforcement Learning; Experience Replay Mechanism; Sampling Mechanism;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A new algorithm based on SARSA is proposed to avoid the overestimation problem in traditional reinforcement learning. Different from traditional methods to overcome this problem, the new algorithm can alleviate overestimation without significantly increasing the algorithm complexity. At the same time, aiming to problems existing in traditional SARSA, such as the weak ability of active exploration and unsatisfactory convergent results, the structure of Experience Memory Replay(EMR) is creatively modified in this paper. The new algorithm proposed in this paper changes the traditional experience playback structure and creatively adds counterfactual experience, which is called DCER(Dynamic Counterfactual Experience Replay) combining on-policy and off-policy. The exploration performance of the algorithm is increased by adding different experiences to EMR from the actual action when sampling. The algorithm was applied in the Gym Cartpole environment and compared with the traditional algorithm in the same environment, proving that the improved algorithm improved the performance of SARSA. Finally, the feasibility of the algorithm in a multi-agent reinforcement learning environment is analyzed.

引用

页码：1994 / 2001

页数：8

共 50 条

[41] Multi-Input Autonomous Driving Based on Deep Reinforcement Learning With Double Bias Experience Replay
Cui, Jianping
Yuan, Liang
He, Li
Xiao, Wendong
Ran, Teng
Zhang, Jianbo
IEEE SENSORS JOURNAL, 2023, 23 (11) : 11253 - 11261
[42] Unveiling the Effects of Experience Replay on Deep Reinforcement Learning-based Power Allocation in Wireless Networks
Kopic, Amna
Perenda, Erma
Gacanin, Haris
2024 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, WCNC 2024, 2024,
[43] Exploring a Reinforcement Learning Agent with Improved Prioritized Experience Replay for a Confrontation Game
Zhao, Tian
2022 INTERNATIONAL CONFERENCE ON BIG DATA, INFORMATION AND COMPUTER NETWORK (BDICN 2022), 2022, : 373 - 381
[44] Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review
Hu Z.-J.
Gao X.-G.
Wan K.-F.
Zhang L.-T.
Wang Q.-L.
Neretin E.
Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (11): : 2237 - 2256
[45] Multi-agent collaborative path planning algorithm with reinforcement learning and combined prioritized experience replay in Internet of Things
Liu, Ping
Ma, Xiangyu
Ding, Jie
Gu, Chenyu
COMPUTERS & ELECTRICAL ENGINEERING, 2024, 116
[46] Re-attentive experience replay in off-policy reinforcement learning
Wei, Wei
Wang, Da
Li, Lin
Liang, Jiye
MACHINE LEARNING, 2024, 113 (05) : 2327 - 2349
[47] Reinforcement Learning with Experience Replay for Model-Free Humanoid Walking Optimization
Wawrzynski, Pawel
INTERNATIONAL JOURNAL OF HUMANOID ROBOTICS, 2014, 11 (03)
[48] Deep reinforcement learning via good choice resampling experience replay memory
Chen X.-L.
Cao L.
Li C.-X.
Xu Z.-X.
He M.
Chen, Xi-Liang (383618393@qq.com), 2018, Northeast University (33): : 600 - 606
[49] Re-attentive experience replay in off-policy reinforcement learning
Wei Wei
Da Wang
Lin Li
Jiye Liang
Machine Learning, 2024, 113 : 2327 - 2349
[50] The Effects of Memory Replay in Reinforcement Learning
Liu, Ruishan
Zou, James
2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 478 - 485

← 1 2 3 4 5 →