A New Reinforcement Learning Algorithm Based on Counterfactual Experience Replay

被引：0

作者：

Li Menglin ^{[1
]}

Chen Jing ^{[1
]}

Chen Shaofei ^{[1
]}

Gao Wei ^{[1
]}

机构：

[1] Natl Univ Def Technol, Changsha 410005, Peoples R China

来源：

PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE | 2020年

关键词：

Reinforcement Learning; Experience Replay Mechanism; Sampling Mechanism;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A new algorithm based on SARSA is proposed to avoid the overestimation problem in traditional reinforcement learning. Different from traditional methods to overcome this problem, the new algorithm can alleviate overestimation without significantly increasing the algorithm complexity. At the same time, aiming to problems existing in traditional SARSA, such as the weak ability of active exploration and unsatisfactory convergent results, the structure of Experience Memory Replay(EMR) is creatively modified in this paper. The new algorithm proposed in this paper changes the traditional experience playback structure and creatively adds counterfactual experience, which is called DCER(Dynamic Counterfactual Experience Replay) combining on-policy and off-policy. The exploration performance of the algorithm is increased by adding different experiences to EMR from the actual action when sampling. The algorithm was applied in the Gym Cartpole environment and compared with the traditional algorithm in the same environment, proving that the improved algorithm improved the performance of SARSA. Finally, the feasibility of the algorithm in a multi-agent reinforcement learning environment is analyzed.

引用

页码：1994 / 2001

页数：8

共 50 条

[31] Experience Replay Optimization via ESMM for Stable Deep Reinforcement Learning
Osei, Richard Sakyi
Lopez, Daphne
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (01) : 715 - 723
[32] Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations
Skrynnik, Alexey
Staroverov, Aleksey
Aitygulov, Ermek
Aksenov, Kirill
Davydov, Vasilii
Panov, Aleksandr, I
KNOWLEDGE-BASED SYSTEMS, 2021, 218
[33] Robust experience replay sampling for multi-agent reinforcement learning
Nicholaus, Isack Thomas
Kang, Dae-Ki
PATTERN RECOGNITION LETTERS, 2022, 155 : 135 - 142
[34] Counterfactual based reinforcement learning for graph neural networks
Pham, David
Zhang, Yongfeng
ANNALS OF OPERATIONS RESEARCH, 2022,
[35] Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement Learning
Lin, Yijiong
Huang, Jiancong
Zimmer, Matthieu
Guan, Yisheng
Rojas, Juan
Weng, Paul
IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04) : 6615 - 6622
[36] Regret Minimization Experience Replay in Off-Policy Reinforcement Learning
Liu, Xu-Hui
Xue, Zhenghai
Pang, Jing-Cheng
Jiang, Shengyi
Xu, Feng
Yu, Yang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[37] Memory Reduction through Experience Classification for Deep Reinforcement Learning with Prioritized Experience Replay
Shen, Kai-Huan
Tsai, Pei-Yun
PROCEEDINGS OF THE 2019 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2019), 2019, : 166 - 171
[38] Energy management strategy based on an improved TD3 reinforcement algorithm with novel experience replay
Niu, Zegong
Huang, Ruchen
He, Hongwen
Zhou, Zhiqiang
Su, Qicong
2023 IEEE VEHICLE POWER AND PROPULSION CONFERENCE, VPPC, 2023,
[39] Double Broad Reinforcement Learning Based on Hindsight Experience Replay for Collision Avoidance of Unmanned Surface Vehicles
Yu, Jiabao
Chen, Jiawei
Chen, Ying
Zhou, Zhiguo
Duan, Junwei
JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2022, 10 (12)
[40] Map-based experience replay: a memory-efficient solution to catastrophic forgetting in reinforcement learning
Hafez, Muhammad Burhan
Immisch, Tilman
Weber, Tom
Wermter, Stefan
FRONTIERS IN NEUROROBOTICS, 2023, 17

← 1 2 3 4 5 →