End-to-end nonprehensile rearrangement with deep reinforcement learning and simulation-to-reality transfer

被引:30
|
作者
Yuan, Weihao [1 ]
Hang, Kaiyu [2 ]
Kragic, Danica [3 ]
Wang, Michael Y. [1 ]
Stork, Johannes A. [4 ]
机构
[1] Hong Kong Univ Sci & Technol, ECE, Robot Inst, Hong Kong, Peoples R China
[2] Yale Univ, Mech Engn & Mat Sci, New Haven, CT USA
[3] KTH Royal Inst Technol, EECS, Ctr Autonomous Syst, Stockholm, Sweden
[4] Orebro Univ, Ctr Appl Autonomous Sensor Syst, Orebro, Sweden
关键词
Nonprehensile rearrangement; Deep reinforcement learning; Transfer learning;
D O I
10.1016/j.robot.2019.06.007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nonprehensile rearrangement is the problem of controlling a robot to interact with objects through pushing actions in order to reconfigure the objects into a predefined goal pose. In this work, we rearrange one object at a time in an environment with obstacles using an end-to-end policy that maps raw pixels as visual input to control actions without any form of engineered feature extraction. To reduce the amount of training data that needs to be collected using a real robot, we propose a simulation-to-reality transfer approach. In the first step, we model the nonprehensile rearrangement task in simulation and use deep reinforcement learning to learn a suitable rearrangement policy, which requires in the order of hundreds of thousands of example actions for training. Thereafter, we collect a small dataset of only 70 episodes of real-world actions as supervised examples for adapting the learned rearrangement policy to real-world input data. In this process, we make use of newly proposed strategies for improving the reinforcement learning process, such as heuristic exploration and the curation of a balanced set of experiences. We evaluate our method in both simulation and real setting using a Baxter robot to show that the proposed approach can effectively improve the training process in simulation, as well as efficiently adapt the learned policy to the real world application, even when the camera pose is different from simulation. Additionally, we show that the learned system not only can provide adaptive behavior to handle unforeseen events during executions, such as distraction objects, sudden changes in positions of the objects, and obstacles, but also can deal with obstacle shapes that were not present in the training process. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:119 / 134
页数:16
相关论文
共 50 条
  • [31] End-to-End AUV Local Motion Planning Method Based on Deep Reinforcement Learning
    Lyu, Xi
    Sun, Yushan
    Wang, Lifeng
    Tan, Jiehui
    Zhang, Liwen
    [J]. JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2023, 11 (09)
  • [32] An End-to-End Path Planner Combining Potential Field Method With Deep Reinforcement Learning
    Wang, Yixuan
    Shen, Bin
    Nan, Zhuojiang
    Tao, Wei
    [J]. IEEE SENSORS JOURNAL, 2024, 24 (16) : 26584 - 26591
  • [33] WarpDrive: Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU
    Lan, Tian
    Srinivasa, Sunil
    Wang, Huan
    Zheng, Stephan
    [J]. Journal of Machine Learning Research, 2022, 23
  • [34] End-to-End Deep Reinforcement Learning for Image-Based UAV Autonomous Control
    Zhao, Jiang
    Sun, Jiaming
    Cai, Zhihao
    Wang, Longhong
    Wang, Yingxun
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [35] Toward End-to-End Control for UAV Autonomous Landing via Deep Reinforcement Learning
    Polvara, Riccardo
    Patacchiola, Massimiliano
    Sharma, Sanjay
    Wan, Jian
    Manning, Andrew
    Sutton, Robert
    Cangelosi, Angelo
    [J]. 2018 INTERNATIONAL CONFERENCE ON UNMANNED AIRCRAFT SYSTEMS (ICUAS), 2018, : 115 - 123
  • [36] Robotic Odor Source Localization via End-to-End Recurrent Deep Reinforcement Learning
    Wang, Lingxiao
    Pang, Shuo
    [J]. 2023 SEVENTH IEEE INTERNATIONAL CONFERENCE ON ROBOTIC COMPUTING, IRC 2023, 2023, : 43 - 50
  • [37] End-to-end Autonomous Driving in Heterogeneous Traffic Scenario Using Deep Reinforcement Learning
    Chakraborty, Soumyajit
    Kumar, Subhadeep
    Bhatt, Nirav
    Pasumarthy, Ramkrishna
    [J]. 2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [38] An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning
    Zhang, Ji
    Liu, Yu
    Zhou, Ke
    Li, Guoliang
    Xiao, Zhili
    Cheng, Bin
    Xing, Jiashu
    Wang, Yangtao
    Cheng, Tianheng
    Liu, Li
    Ran, Minwei
    Li, Zekang
    [J]. SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 415 - 432
  • [39] Hierarchical policy with deep-reinforcement learning for nonprehensile multiobject rearrangement
    Bai, Fan
    Meng, Fei
    Liu, Jianbang
    Wang, Jiankun
    Meng, Max Q.-H.
    [J]. Biomimetic Intelligence and Robotics, 2022, 2 (03):
  • [40] End-to-End Video Captioning with Multitask Reinforcement Learning
    Li, Lijun
    Gong, Boqing
    [J]. 2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 339 - 348