Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards

被引：5

作者：

Luo, Yongle ^{[1
,2
]}

Wang, Yuxin ^{[1
,2
]}

Dong, Kun ^{[1
,2
]}

Zhang, Qiang ^{[1
,2
]}

Cheng, Erkang ^{[1
,2
]}

Sun, Zhiyong ^{[1
,2
]}

Song, Bo ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci, Hefei Inst Phys Sci, Inst Intelligent Machines, Hefei 230031, Peoples R China

[2] Univ Sci & Technol China, Hefei 230026, Peoples R China

[3] Jianghuai Frontier Technol Coordinat & Innovat Ctr, Hefei 230088, Peoples R China

来源：

NEUROCOMPUTING | 2023年 / 557卷

关键词：

Deep reinforcement learning; Robotic manipulation; Continual learning; Hindsight experience replay; Sparse reward;

D O I：

10.1016/j.neucom.2023.126620

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning with sparse rewards remains a challenging problem in reinforcement learning (RL). In particular, for sequential object manipulation tasks, the RL agent generally only receives a reward upon successful completion of the entire task, leading to low exploration efficiency. To address this sample inefficiency, we propose a novel self-guided continual RL framework, named Relay Hindsight Experience Replay (RHER). RHER decomposes a sequential task into several subtasks with increasing complexity, allowing the agent to learn from the simplest subtask and gradually complete the task. It is crucial that a Self-Guided Exploration Strategy (SGES) is proposed to use the already-learned simpler subtask policy to guide the exploration of a more complex subtask. This strategy allows the agent to break the barriers of sparse reward sequential tasks and achieve efficient learning stage by stage. As a result, the proposed RHER method achieves state-of-the-art performance on the benchmark tasks (FetchPush and FetchPickAndPlace). Furthermore, the experimental results demonstrate the superiority and high efficiency of RHER on a variety of single-object and multi-object manipulation tasks (e.g., ObstaclePush, DrawerBox, TStack, etc.). Finally, the proposed RHER method can also learn a contact-rich task on a real robot from scratch within 250 episodes.

引用

页数：15

共 6 条

[1] Curriculum learning with Hindsight Experience Replay for sequential object manipulation tasks
Manela, B.
Biess, A.
[J]. NEURAL NETWORKS, 2022, 145 : 260 - 270
[2] Multimodal fusion for autonomous navigation via deep reinforcement learning with sparse rewards and hindsight experience replay
Xiao, Wendong
Yuan, Liang
Ran, Teng
He, Li
Zhang, Jianbo
Cui, Jianping
[J]. DISPLAYS, 2023, 78
[3] Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards
Zuo, Guoyu
Zhao, Qishen
Lu, Jiahao
Li, Jiangeng
[J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2020, 17 (01):
[4] SLER: Self-generated long-term experience replay for continual reinforcement learning
Li, Chunmao
Li, Yang
Zhao, Yinliang
Peng, Peng
Geng, Xupeng
[J]. APPLIED INTELLIGENCE, 2021, 51 (01) : 185 - 201
[5] SLER: Self-generated long-term experience replay for continual reinforcement learning
Chunmao Li
Yang Li
Yinliang Zhao
Peng Peng
Xupeng Geng
[J]. Applied Intelligence, 2021, 51 : 185 - 201
[6] Efficient Policy Learning for General Robotic Tasks with Adaptive Dual-memory Hindsight Experience Replay Based on Deep Reinforcement Learning
Dong, Menghua
Ying, Fengkang
Li, Xiangjian
Liu, Huashan
[J]. 2023 7TH INTERNATIONAL CONFERENCE ON ROBOTICS, CONTROL AND AUTOMATION, ICRCA, 2023, : 62 - 66

← 1 →