A New Reinforcement Learning Algorithm Based on Counterfactual Experience Replay

被引：0

作者：

Li Menglin ^{[1
]}

Chen Jing ^{[1
]}

Chen Shaofei ^{[1
]}

Gao Wei ^{[1
]}

机构：

[1] Natl Univ Def Technol, Changsha 410005, Peoples R China

来源：

PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE | 2020年

关键词：

Reinforcement Learning; Experience Replay Mechanism; Sampling Mechanism;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A new algorithm based on SARSA is proposed to avoid the overestimation problem in traditional reinforcement learning. Different from traditional methods to overcome this problem, the new algorithm can alleviate overestimation without significantly increasing the algorithm complexity. At the same time, aiming to problems existing in traditional SARSA, such as the weak ability of active exploration and unsatisfactory convergent results, the structure of Experience Memory Replay(EMR) is creatively modified in this paper. The new algorithm proposed in this paper changes the traditional experience playback structure and creatively adds counterfactual experience, which is called DCER(Dynamic Counterfactual Experience Replay) combining on-policy and off-policy. The exploration performance of the algorithm is increased by adding different experiences to EMR from the actual action when sampling. The algorithm was applied in the Gym Cartpole environment and compared with the traditional algorithm in the same environment, proving that the improved algorithm improved the performance of SARSA. Finally, the feasibility of the algorithm in a multi-agent reinforcement learning environment is analyzed.

引用

页码：1994 / 2001

页数：8

共 50 条

[21] Prioritized experience replay based deep distributional reinforcement learning for battery operation in microgrids
Panda, Deepak Kumar
Turner, Oliver
Das, Saptarshi
Abusara, Mohammad
JOURNAL OF CLEANER PRODUCTION, 2024, 434
[22] Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay
Yin, Haiyan
Pan, Sinno Jialin
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1640 - 1646
[23] Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning
Cha, Han
Park, Jihong
Kim, Hyesung
Bennis, Mehdi
Kim, Seong-Lyun
IEEE INTELLIGENT SYSTEMS, 2020, 35 (04) : 94 - 101
[24] Counterfactual-Based Action Evaluation Algorithm in Multi-Agent Reinforcement Learning
Yuan, Yuyu
Zhao, Pengqian
Guo, Ting
Jiang, Hongpu
APPLIED SCIENCES-BASEL, 2022, 12 (07):
[25] Prioritized experience replay based reinforcement learning for adaptive tracking control of autonomous underwater vehicle
Li, Ting
Yang, Dongsheng
Xie, Xiangpeng
APPLIED MATHEMATICS AND COMPUTATION, 2023, 443
[26] Intrusion Detection Based on Adaptive Sample Distribution Dual-Experience Replay Reinforcement Learning
Tan, Haonan
Wang, Le
Zhu, Dong
Deng, Jianyu
MATHEMATICS, 2024, 12 (07)
[27] Composite Experience Replay-Based Deep Reinforcement Learning With Application in Wind Farm Control
Dong, Hongyang
Zhao, Xiaowei
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2022, 30 (03) : 1281 - 1295
[28] Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning
Foerster, Jakob
Nardelli, Nantas
Farquhar, Gregory
Afouras, Triantafyllos
Torr, Philip H. S.
Kohli, Pushmeet
Whiteson, Shimon
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[29] Balanced prioritized experience replay in off-policy reinforcement learning
Lou Z.
Wang Y.
Shan S.
Zhang K.
Wei H.
Neural Computing and Applications, 2024, 36 (25) : 15721 - 15737
[30] Enhanced Off-Policy Reinforcement Learning With Focused Experience Replay
Kong, Seung-Hyun
Nahrendra, I. Made Aswin
Paek, Dong-Hee
IEEE ACCESS, 2021, 9 (09): : 93152 - 93164

← 1 2 3 4 5 →