End-to-end nonprehensile rearrangement with deep reinforcement learning and simulation-to-reality transfer

被引：30

作者：

Yuan, Weihao ^{[1
]}

Hang, Kaiyu ^{[2
]}

Kragic, Danica ^{[3
]}

Wang, Michael Y. ^{[1
]}

Stork, Johannes A. ^{[4
]}

机构：

[1] Hong Kong Univ Sci & Technol, ECE, Robot Inst, Hong Kong, Peoples R China

[2] Yale Univ, Mech Engn & Mat Sci, New Haven, CT USA

[3] KTH Royal Inst Technol, EECS, Ctr Autonomous Syst, Stockholm, Sweden

[4] Orebro Univ, Ctr Appl Autonomous Sensor Syst, Orebro, Sweden

来源：

ROBOTICS AND AUTONOMOUS SYSTEMS | 2019年 / 119卷

关键词：

Nonprehensile rearrangement; Deep reinforcement learning; Transfer learning;

D O I：

10.1016/j.robot.2019.06.007

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Nonprehensile rearrangement is the problem of controlling a robot to interact with objects through pushing actions in order to reconfigure the objects into a predefined goal pose. In this work, we rearrange one object at a time in an environment with obstacles using an end-to-end policy that maps raw pixels as visual input to control actions without any form of engineered feature extraction. To reduce the amount of training data that needs to be collected using a real robot, we propose a simulation-to-reality transfer approach. In the first step, we model the nonprehensile rearrangement task in simulation and use deep reinforcement learning to learn a suitable rearrangement policy, which requires in the order of hundreds of thousands of example actions for training. Thereafter, we collect a small dataset of only 70 episodes of real-world actions as supervised examples for adapting the learned rearrangement policy to real-world input data. In this process, we make use of newly proposed strategies for improving the reinforcement learning process, such as heuristic exploration and the curation of a balanced set of experiences. We evaluate our method in both simulation and real setting using a Baxter robot to show that the proposed approach can effectively improve the training process in simulation, as well as efficiently adapt the learned policy to the real world application, even when the camera pose is different from simulation. Additionally, we show that the learned system not only can provide adaptive behavior to handle unforeseen events during executions, such as distraction objects, sudden changes in positions of the objects, and obstacles, but also can deal with obstacle shapes that were not present in the training process. (C) 2019 Elsevier B.V. All rights reserved.

引用

页码：119 / 134

页数：16

共 50 条

[1] End-to-End Deep Reinforcement Learning for Exoskeleton Control
Rose, Lowell
Bazzocchi, Michael C. F.
Nejat, Goldie
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 4294 - 4301
[2] NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning
Haj-Ali, Ameer
Ahmed, Nesreen K.
Willke, Ted
Shao, Yakun Sophia
Asanovic, Krste
Stoica, Ion
[J]. CGO'20: PROCEEDINGS OF THE18TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2020, : 242 - 255
[3] End-to-End Race Driving with Deep Reinforcement Learning
Jaritz, Maximilian
de Charette, Raoul
Toromanoff, Marin
Perot, Etienne
Nashashibi, Fawzi
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 2070 - 2075
[4] End-to-end Control of Kart Agent with Deep Reinforcement Learning
Zhang Ruiming
Liu Chengju
Chen Qijun
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO), 2018, : 1688 - 1693
[5] End-to-end Deep Reinforcement Learning Based Coreference Resolution
Fei, Hongliang
Li, Xu
Li, Dingcheng
Li, Ping
[J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 660 - 665
[6] Rearrangement with Nonprehensile Manipulation Using Deep Reinforcement Learning
Yuan, Weihao
Stork, Johannes A.
Kragic, Danica
Wang, Michael Y.
Hang, Kaiyu
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 270 - 277
[7] End-to-End Autonomous Driving Decision Based on Deep Reinforcement Learning
Huang, Zhiqing
Zhang, Ji
Tian, Rui
Zhang, Yanxin
[J]. CONFERENCE PROCEEDINGS OF 2019 5TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS (ICCAR), 2019, : 658 - 662
[8] Optimization of Neuroprosthetic Vision via End-to-End Deep Reinforcement Learning
Kucukoglu, Burcu
Rueckauer, Bodo
Ahmad, Nasir
van Steveninck, Jaap de Ruyter
Guclu, Umut
van Gerven, Marcel
[J]. INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2022, 32 (11)
[9] End-to-End Autonomous Exploration with Deep Reinforcement Learning and Intrinsic Motivation
Ruan, Xiaogang
Li, Peng
Zhu, Xiaoqing
Yu, Hejie
Yu, Naigong
[J]. Computational Intelligence and Neuroscience, 2021, 2021
[10] End-to-End Autonomous Exploration with Deep Reinforcement Learning and Intrinsic Motivation
Ruan, Xiaogang
Li, Peng
Zhu, Xiaoqing
Yu, Hejie
Yu, Naigong
[J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021

← 1 2 3 4 5 →