An analysis of experience replay in temporal difference learning

被引：16

作者：

Cichosz, P ^{[1
]}

机构：

[1] Warsaw Univ Technol, Inst Elect Syst, PL-00665 Warsaw, Poland

来源：

CYBERNETICS AND SYSTEMS | 1999年 / 30卷 / 05期

关键词：

D O I：

10.1080/019697299125127

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Temporal difference (TD) methods are used by reinforcement learning algorithms for predicting future rewards. This article analyzes theoretically and illustrates experimentally the effects of performing TD(lambda) prediction udpates backwards for a number of past experiences. More exactly, two related techniques described in the literature are examined, referred to as replayed TD and backwards TD. The former is essentially an online learning method which performs at each time step a regular TD(0) update, and then replays updates backwards for a number of previous states. The latter operates in offline mode, after the end of a trial updating backwards the predictions for all visited states. They are both shown to be approximately equivalent to TD(lambda) with variable lambda values selected in a particular way. This is true even if they perform only TD(0) updates. The experimental results show that replayed TD(0) is competitive to TD(lambda) with regard to learning speed and quality.

引用

页码：341 / 363

页数：23

共 50 条

[1] Correlation minimizing replay memory in temporal-difference reinforcement learning
Ramicic, Mirza
Bonarinib, Andrea
[J]. NEUROCOMPUTING, 2020, 393 : 91 - 100
[2] Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model
Johnson, A
Redish, AD
[J]. NEURAL NETWORKS, 2005, 18 (09) : 1163 - 1171
[3] Experience Replay for Continual Learning
Rolnick, David
Ahuja, Arun
Schwarz, Jonathan
Lillicrap, Timothy P.
Wayne, Greg
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[4] Tractable Reinforcement Learning for Signal Temporal Logic Tasks With Counterfactual Experience Replay
Wang, Siqi
Yin, Xunyuan
Li, Shaoyuan
Yin, Xiang
[J]. IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 616 - 621
[5] CONVERGENCE ANALYSIS ON TEMPORAL DIFFERENCE LEARNING
Leng, Jinsong
Jain, Lakhmi
Fyfe, Colin
[J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2009, 5 (04): : 913 - 922
[6] Autonomous reinforcement learning with experience replay
Wawrzynski, Pawel
Tanwani, Ajay Kumar
[J]. NEURAL NETWORKS, 2013, 41 : 156 - 167
[7] Learning on Streaming Graphs with Experience Replay
Perini, Massimo
Ramponi, Giorgia
Carbone, Paris
Kalavri, Vasiliki
[J]. 37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2022, : 470 - 478
[8] Selective Experience Replay for Lifelong Learning
Isele, David
Cosgun, Akansel
[J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3302 - 3309
[9] An Analysis of Quantile Temporal-Difference Learning
Rowland, Mark
Munos, Remi
Azar, Mohammad Gheshlaghi
Tang, Yunhao
Ostrovski, Georg
Harutyunyan, Anna
Tuyls, Karl
Bellemare, Marc G.
Dabney, Will
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[10] Memory Efficient Experience Replay for Streaming Learning
Hayes, Tyler L.
Cahill, Nathan D.
Kanan, Christopher
[J]. 2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 9769 - 9776

← 1 2 3 4 5 →