Autonomous reinforcement learning with experience replay

被引：38

作者：

Wawrzynski, Pawel ^{[1
]}

Tanwani, Ajay Kumar ^{[1
,2
]}

机构：

[1] Warsaw Univ Technol, Inst Control & Computat Engn, Warsaw, Poland

[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland

来源：

NEURAL NETWORKS | 2013年 / 41卷

关键词：

Reinforcement learning; Autonomous learning; Step-size estimation; Actor-critic; ACTOR-CRITIC ALGORITHMS; RATE ADAPTATION; ENVIRONMENTS; CONVERGENCE; NETWORKS;

D O I：

10.1016/j.neunet.2012.11.007

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. (c) 2012 Elsevier Ltd. All rights reserved.

引用

页码：156 / 167

页数：12

共 50 条

[31] Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review
Hu, Zi-Jian
Gao, Xiao-Guang
Wan, Kai-Fang
Zhang, Le-Tian
Wang, Qiang-Long
Neretin, Evgeny
[J]. Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (11): : 2237 - 2256
[32] Deep reinforcement learning via good choice resampling experience replay memory
[J]. Chen, Xi-Liang (383618393@qq.com), 2018, Northeast University (33):
[33] Reinforcement Learning with Experience Replay for Model-Free Humanoid Walking Optimization
Wawrzynski, Pawel
[J]. INTERNATIONAL JOURNAL OF HUMANOID ROBOTICS, 2014, 11 (03)
[34] Re-attentive experience replay in off-policy reinforcement learning
Wei, Wei
Wang, Da
Li, Lin
Liang, Jiye
[J]. MACHINE LEARNING, 2024, 113 (05) : 2327 - 2349
[35] Tractable Reinforcement Learning for Signal Temporal Logic Tasks With Counterfactual Experience Replay
Wang, Siqi
Yin, Xunyuan
Li, Shaoyuan
Yin, Xiang
[J]. IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 616 - 621
[36] Re-attentive experience replay in off-policy reinforcement learning
Wei Wei
Da Wang
Lin Li
Jiye Liang
[J]. Machine Learning, 2024, 113 : 2327 - 2349
[37] Batch process control based on reinforcement learning with segmented prioritized experience replay
Xu, Chen
Ma, Junwei
Tao, Hongfeng
[J]. MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (05)
[38] The Effects of Memory Replay in Reinforcement Learning
Liu, Ruishan
Zou, James
[J]. 2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 478 - 485
[39] Experience Replay for Continual Learning
Rolnick, David
Ahuja, Arun
Schwarz, Jonathan
Lillicrap, Timothy P.
Wayne, Greg
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[40] HiER: Highlight Experience Replay for Boosting Off-Policy Reinforcement Learning Agents
Horvath, Daniel
Martin, Jesus Bujalance
Erdos, Ferenc Gabor
Istenes, Zoltan
Moutarde, Fabien
[J]. IEEE ACCESS, 2024, 12 : 100102 - 100119

← 1 2 3 4 5 →