Experience Replay Optimization via ESMM for Stable Deep Reinforcement Learning

被引：0

作者：

Osei, Richard Sakyi ^{[1
]}

Lopez, Daphne ^{[1
]}

机构：

[1] Vellore Inst Technol, Sch Comp Sci Engn & Informat Syst, Vellore, India

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2024年 / 15卷 / 01期

关键词：

Experience replay; experience replay optimization; experience retention strategy; experience selection strategy; replay memory management;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The memorization and reuse of experience, popularly known as experience replay (ER), has improved the performance of off-policy deep reinforcement learning (DRL) algorithms such as deep Q-networks (DQN) and deep deterministic policy gradients (DDPG). Despite its success, ER faces the challenges of noisy transitions, large memory sizes, and unstable returns. Researchers have introduced replay mechanisms focusing on experience selection strategies to address these issues. However, the choice of experience retention strategy has a significant influence on the selection strategy. Experience Replay Optimization (ERO) is a novel reinforcement learning algorithm that uses a deep replay policy for experience selection. However, ERO relies on the naive first-in-first-out (FIFO) retention strategy, which seeks to manage replay memory by constantly retaining recent experiences irrespective of their relevance to the agent's learning. FIFO sequentially overwrites the oldest experience with a new one when the replay memory is full. To improve the retention strategy of ERO, we propose an experience replay optimization with enhanced sequential memory management (ERO-ESMM). ERO-ESMM uses an improved sequential retention strategy to manage the replay memory efficiently and stabilize the performance of the DRL agent. The efficacy of the ESMM strategy is evaluated together with five additional retention strategies across four distinct OpenAI environments. The experimental results indicate that ESMM performs better than the other five fundamental retention strategies.

引用

页码：715 / 723

页数：9

共 50 条

[1] Deep Reinforcement Learning with Experience Replay Based on SARSA
Zhao, Dongbin
Wang, Haitao
Shao, Kun
Zhu, Yuanheng
[J]. PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,
[2] Deep Reinforcement Learning With Quantum-Inspired Experience Replay
Wei, Qing
Ma, Hailan
Chen, Chunlin
Dong, Daoyi
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) : 9326 - 9338
[3] Associative Memory Based Experience Replay for Deep Reinforcement Learning
Li, Mengyuan
Kazemi, Arman
Laguna, Ann Franchesca
Hu, X. Sharon
[J]. 2022 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2022,
[4] Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay
Yin, Haiyan
Pan, Sinno Jialin
[J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1640 - 1646
[5] Trial and Error Experience Replay Based Deep Reinforcement Learning
Zhang, Cheng
Ma, Liang
[J]. 4TH IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2019) / 3RD INTERNATIONAL SYMPOSIUM ON REINFORCEMENT LEARNING (ISRL 2019), 2019, : 221 - 226
[6] Autonomous Reinforcement Learning with Experience Replay for Humanoid Gait Optimization
Wawrzynski, Pawel
[J]. PROCEEDINGS OF THE INTERNATIONAL NEURAL NETWORK SOCIETY WINTER CONFERENCE (INNS-WC2012), 2012, 13 : 205 - 211
[7] Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning
Foerster, Jakob
Nardelli, Nantas
Farquhar, Gregory
Afouras, Triantafyllos
Torr, Philip H. S.
Kohli, Pushmeet
Whiteson, Shimon
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[8] Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement Learning
Lin, Yijiong
Huang, Jiancong
Zimmer, Matthieu
Guan, Yisheng
Rojas, Juan
Weng, Paul
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04): : 6615 - 6622
[9] Autonomous reinforcement learning with experience replay
Wawrzynski, Pawel
Tanwani, Ajay Kumar
[J]. NEURAL NETWORKS, 2013, 41 : 156 - 167
[10] Multimodal fusion for autonomous navigation via deep reinforcement learning with sparse rewards and hindsight experience replay
Xiao, Wendong
Yuan, Liang
Ran, Teng
He, Li
Zhang, Jianbo
Cui, Jianping
[J]. DISPLAYS, 2023, 78

← 1 2 3 4 5 →