Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards

被引:5
|
作者
Luo, Yongle [1 ,2 ]
Wang, Yuxin [1 ,2 ]
Dong, Kun [1 ,2 ]
Zhang, Qiang [1 ,2 ]
Cheng, Erkang [1 ,2 ]
Sun, Zhiyong [1 ,2 ]
Song, Bo [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Hefei Inst Phys Sci, Inst Intelligent Machines, Hefei 230031, Peoples R China
[2] Univ Sci & Technol China, Hefei 230026, Peoples R China
[3] Jianghuai Frontier Technol Coordinat & Innovat Ctr, Hefei 230088, Peoples R China
关键词
Deep reinforcement learning; Robotic manipulation; Continual learning; Hindsight experience replay; Sparse reward;
D O I
10.1016/j.neucom.2023.126620
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning with sparse rewards remains a challenging problem in reinforcement learning (RL). In particular, for sequential object manipulation tasks, the RL agent generally only receives a reward upon successful completion of the entire task, leading to low exploration efficiency. To address this sample inefficiency, we propose a novel self-guided continual RL framework, named Relay Hindsight Experience Replay (RHER). RHER decomposes a sequential task into several subtasks with increasing complexity, allowing the agent to learn from the simplest subtask and gradually complete the task. It is crucial that a Self-Guided Exploration Strategy (SGES) is proposed to use the already-learned simpler subtask policy to guide the exploration of a more complex subtask. This strategy allows the agent to break the barriers of sparse reward sequential tasks and achieve efficient learning stage by stage. As a result, the proposed RHER method achieves state-of-the-art performance on the benchmark tasks (FetchPush and FetchPickAndPlace). Furthermore, the experimental results demonstrate the superiority and high efficiency of RHER on a variety of single-object and multi-object manipulation tasks (e.g., ObstaclePush, DrawerBox, TStack, etc.). Finally, the proposed RHER method can also learn a contact-rich task on a real robot from scratch within 250 episodes.
引用
收藏
页数:15
相关论文
共 6 条
  • [1] Curriculum learning with Hindsight Experience Replay for sequential object manipulation tasks
    Manela, B.
    Biess, A.
    [J]. NEURAL NETWORKS, 2022, 145 : 260 - 270
  • [2] Multimodal fusion for autonomous navigation via deep reinforcement learning with sparse rewards and hindsight experience replay
    Xiao, Wendong
    Yuan, Liang
    Ran, Teng
    He, Li
    Zhang, Jianbo
    Cui, Jianping
    [J]. DISPLAYS, 2023, 78
  • [3] Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards
    Zuo, Guoyu
    Zhao, Qishen
    Lu, Jiahao
    Li, Jiangeng
    [J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2020, 17 (01):
  • [4] SLER: Self-generated long-term experience replay for continual reinforcement learning
    Li, Chunmao
    Li, Yang
    Zhao, Yinliang
    Peng, Peng
    Geng, Xupeng
    [J]. APPLIED INTELLIGENCE, 2021, 51 (01) : 185 - 201
  • [5] SLER: Self-generated long-term experience replay for continual reinforcement learning
    Chunmao Li
    Yang Li
    Yinliang Zhao
    Peng Peng
    Xupeng Geng
    [J]. Applied Intelligence, 2021, 51 : 185 - 201
  • [6] Efficient Policy Learning for General Robotic Tasks with Adaptive Dual-memory Hindsight Experience Replay Based on Deep Reinforcement Learning
    Dong, Menghua
    Ying, Fengkang
    Li, Xiangjian
    Liu, Huashan
    [J]. 2023 7TH INTERNATIONAL CONFERENCE ON ROBOTICS, CONTROL AND AUTOMATION, ICRCA, 2023, : 62 - 66