Autonomous reinforcement learning with experience replay

被引:38
|
作者
Wawrzynski, Pawel [1 ]
Tanwani, Ajay Kumar [1 ,2 ]
机构
[1] Warsaw Univ Technol, Inst Control & Computat Engn, Warsaw, Poland
[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland
关键词
Reinforcement learning; Autonomous learning; Step-size estimation; Actor-critic; ACTOR-CRITIC ALGORITHMS; RATE ADAPTATION; ENVIRONMENTS; CONVERGENCE; NETWORKS;
D O I
10.1016/j.neunet.2012.11.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor-critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time. (c) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:156 / 167
页数:12
相关论文
共 50 条
  • [31] Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review
    Hu, Zi-Jian
    Gao, Xiao-Guang
    Wan, Kai-Fang
    Zhang, Le-Tian
    Wang, Qiang-Long
    Neretin, Evgeny
    [J]. Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (11): : 2237 - 2256
  • [32] Deep reinforcement learning via good choice resampling experience replay memory
    [J]. Chen, Xi-Liang (383618393@qq.com), 2018, Northeast University (33):
  • [33] Reinforcement Learning with Experience Replay for Model-Free Humanoid Walking Optimization
    Wawrzynski, Pawel
    [J]. INTERNATIONAL JOURNAL OF HUMANOID ROBOTICS, 2014, 11 (03)
  • [34] Re-attentive experience replay in off-policy reinforcement learning
    Wei, Wei
    Wang, Da
    Li, Lin
    Liang, Jiye
    [J]. MACHINE LEARNING, 2024, 113 (05) : 2327 - 2349
  • [35] Tractable Reinforcement Learning for Signal Temporal Logic Tasks With Counterfactual Experience Replay
    Wang, Siqi
    Yin, Xunyuan
    Li, Shaoyuan
    Yin, Xiang
    [J]. IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 616 - 621
  • [36] Re-attentive experience replay in off-policy reinforcement learning
    Wei Wei
    Da Wang
    Lin Li
    Jiye Liang
    [J]. Machine Learning, 2024, 113 : 2327 - 2349
  • [37] Batch process control based on reinforcement learning with segmented prioritized experience replay
    Xu, Chen
    Ma, Junwei
    Tao, Hongfeng
    [J]. MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (05)
  • [38] The Effects of Memory Replay in Reinforcement Learning
    Liu, Ruishan
    Zou, James
    [J]. 2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 478 - 485
  • [39] Experience Replay for Continual Learning
    Rolnick, David
    Ahuja, Arun
    Schwarz, Jonathan
    Lillicrap, Timothy P.
    Wayne, Greg
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [40] HiER: Highlight Experience Replay for Boosting Off-Policy Reinforcement Learning Agents
    Horvath, Daniel
    Martin, Jesus Bujalance
    Erdos, Ferenc Gabor
    Istenes, Zoltan
    Moutarde, Fabien
    [J]. IEEE ACCESS, 2024, 12 : 100102 - 100119