SLER: Self-generated long-term experience replay for continual reinforcement learning

被引：11

作者：

Li, Chunmao ^{[1
]}

Li, Yang ^{[1
]}

Zhao, Yinliang ^{[1
]}

Peng, Peng ^{[2
]}

Geng, Xupeng ^{[1
]}

机构：

[1] Xianjiaotong Univ, Dept Comp Sci & Technol, Xian 710049, Peoples R China

[2] Inspir Ai, Beijing, Peoples R China

来源：

APPLIED INTELLIGENCE | 2021年 / 51卷 / 01期

关键词：

Continual reinforcement learning; Catastrophic forgetting; Dual experience replay; Experience replay model; NEURAL-NETWORKS; LEVEL; GAME; GO;

D O I：

10.1007/s10489-020-01786-1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep reinforcement learning has achieved significant success in various domains. However, it still faces a huge challenge when learning multiple tasks in sequence. This is because the interaction in a complex setting involves continual learning that results in the change in data distributions over time. A continual learning system should ensure that the agent acquires new knowledge without forgetting the previous one. However, catastrophic forgetting may occur as the new experience can overwrite previous experience due to limited memory size. The dual experience replay algorithm which retains previous experience is widely applied to reduce forgetting, but it cannot be applied in scalable tasks when the memory size is constrained. To alleviate the constrained by the memory size, we propose a new continual reinforcement learning algorithm called Self-generated Long-term Experience Replay (SLER). Our method is different from the standard dual experience replay algorithm, which uses short-term experience replay to retain current task experience, and the long-term experience replay retains all past tasks' experience to achieve continual learning. In this paper, we first trained an environment sample model called Experience Replay Mode (ERM) to generate the simulated state sequence of the previous tasks for knowledge retention. Then combined the ERM with the experience of the new task to generate the simulation experience all previous tasks to alleviate forgetting. Our method can effectively decrease the requirement of memory size in multiple tasks, reinforcement learning. We show that our method in StarCraft II and the GridWorld environments performs better than the state-of-the-art deep learning method and achieve a comparable result to the dual experience replay method, which retains the experience of all the tasks.

引用

页码：185 / 201

页数：17

共 50 条

[1] SLER: Self-generated long-term experience replay for continual reinforcement learning
Chunmao Li
Yang Li
Yinliang Zhao
Peng Peng
Xupeng Geng
[J]. Applied Intelligence, 2021, 51 : 185 - 201
[2] Experience Replay for Continual Learning
Rolnick, David
Ahuja, Arun
Schwarz, Jonathan
Lillicrap, Timothy P.
Wayne, Greg
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] Dynamic self-generated fuzzy systems for reinforcement learning
Er, Meng Joo
Zhou, Yi
[J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION JOINTLY WITH INTERNATIONAL CONFERENCE ON INTELLIGENT AGENTS, WEB TECHNOLOGIES & INTERNET COMMERCE, VOL 1, PROCEEDINGS, 2006, : 193 - +
[4] Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards
Luo, Yongle
Wang, Yuxin
Dong, Kun
Zhang, Qiang
Cheng, Erkang
Sun, Zhiyong
Song, Bo
[J]. NEUROCOMPUTING, 2023, 557
[5] Saliency Guided Experience Packing for Replay in Continual Learning
Saha, Gobinda
Roy, Kaushik
[J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5262 - 5272
[6] Rethinking Experience Replay: Bag of Tricks for Continual Learning
Buzzega, Pietro
Boschini, Matteo
Porrello, Angelo
Calderara, Simone
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2180 - 2187
[7] Autonomous reinforcement learning with experience replay
Wawrzynski, Pawel
Tanwani, Ajay Kumar
[J]. NEURAL NETWORKS, 2013, 41 : 156 - 167
[8] Coordinating Experience Replay: A Harmonious Experience Retention approach for Continual Learning
Ji, Zhong
Liu, Jiayi
Wang, Qiang
Zhang, Zhongfei
[J]. KNOWLEDGE-BASED SYSTEMS, 2021, 234
[9] Learning Bayesian Sparse Networks with Full Experience Replay for Continual Learning
Yan, Qingsen
Gong, Dong
Liu, Yuhang
van den Hengel, Anton
Shi, Javen Qinfeng
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 109 - 118
[10] AdaER: An adaptive experience replay approach for continual lifelong learning
Li, Xingyu
Tang, Bo
Li, Haifeng
[J]. NEUROCOMPUTING, 2024, 572

← 1 2 3 4 5 →