HiER: Highlight Experience Replay for Boosting Off-Policy Reinforcement Learning Agents

被引:0
|
作者
Horvath, Daniel [1 ,2 ,3 ]
Martin, Jesus Bujalance [1 ]
Erdos, Ferenc Gabor [2 ]
Istenes, Zoltan [3 ]
Moutarde, Fabien [1 ]
机构
[1] PSL Univ, Ctr Robot, Mines Paris, F-75272 Paris, France
[2] Hungarian Res Network, Inst Comp Sci & Control, Ctr Excellence Prod Informat & Control, H-1111 Budapest, Hungary
[3] Eotvos Lorand Univ, CoLocat Ctr Acad & Ind Cooperat, H-1117 Budapest, Hungary
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Task analysis; Data collection; Training; Robots; Standards; Random variables; Process control; Curriculum development; Curriculum learning; experience replay; reinforcement learning; robotics;
D O I
10.1109/ACCESS.2024.3427012
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Even though reinforcement-learning-based algorithms achieved superhuman performance in many domains, the field of robotics poses significant challenges as the state and action spaces are continuous, and the reward function is predominantly sparse. Furthermore, on many occasions, the agent is devoid of access to any form of demonstration. Inspired by human learning, in this work, we propose a method named highlight experience replay (HiER) that creates a secondary highlight replay buffer for the most relevant experiences. For the weights update, the transitions are sampled from both the standard and the highlight experience replay buffer. It can be applied with or without the techniques of hindsight experience replay (HER) and prioritized experience replay (PER). Our method significantly improves the performance of the state-of-the-art, validated on 8 tasks of three robotic benchmarks. Furthermore, to exploit the full potential of HiER, we propose HiER+ in which HiER is enhanced with an arbitrary data collection curriculum learning method. Our implementation, the qualitative results, and a video presentation are available on the project site: http://www.danielhorvath.eu/hier/.
引用
收藏
页码:100102 / 100119
页数:18
相关论文
共 50 条
  • [1] Regret Minimization Experience Replay in Off-Policy Reinforcement Learning
    Liu, Xu-Hui
    Xue, Zhenghai
    Pang, Jing-Cheng
    Jiang, Shengyi
    Xu, Feng
    Yu, Yang
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] Balanced prioritized experience replay in off-policy reinforcement learning
    Zhouwei Lou
    Yiye Wang
    Shuo Shan
    Kanjian Zhang
    Haikun Wei
    [J]. Neural Computing and Applications, 2024, 36 (25) : 15721 - 15737
  • [3] Enhanced Off-Policy Reinforcement Learning With Focused Experience Replay
    Kong, Seung-Hyun
    Nahrendra, I. Made Aswin
    Paek, Dong-Hee
    [J]. IEEE ACCESS, 2021, 9 : 93152 - 93164
  • [4] Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review
    Hu, Zi-Jian
    Gao, Xiao-Guang
    Wan, Kai-Fang
    Zhang, Le-Tian
    Wang, Qiang-Long
    Neretin, Evgeny
    [J]. Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (11): : 2237 - 2256
  • [5] Re-attentive experience replay in off-policy reinforcement learning
    Wei Wei
    Da Wang
    Lin Li
    Jiye Liang
    [J]. Machine Learning, 2024, 113 : 2327 - 2349
  • [6] Re-attentive experience replay in off-policy reinforcement learning
    Wei, Wei
    Wang, Da
    Li, Lin
    Liang, Jiye
    [J]. MACHINE LEARNING, 2024, 113 (05) : 2327 - 2349
  • [7] High-Value Prioritized Experience Replay for Off-policy Reinforcement Learning
    Cao, Xi
    Wan, Huaiyu
    Lin, Youfang
    Han, Sheng
    [J]. 2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1510 - 1514
  • [8] Mixed experience sampling for off-policy reinforcement learning
    Yu, Jiayu
    Li, Jingyao
    Lu, Shuai
    Han, Shuai
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 251
  • [9] Safe and efficient off-policy reinforcement learning
    Munos, Remi
    Stepleton, Thomas
    Harutyunyan, Anna
    Bellemare, Marc G.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [10] Bounds for Off-policy Prediction in Reinforcement Learning
    Joseph, Ajin George
    Bhatnagar, Shalabh
    [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 3991 - 3997