Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning

被引:0
|
作者
Gupta, Abhishek [1 ]
Kumar, Vikash [2 ]
Lynch, Corey [2 ]
Levine, Sergey [1 ]
Hausman, Karol [2 ]
机构
[1] Univ Calif Berkeley, Dept EECS, Berkeley, CA 94720 USA
[2] Google Brain, Mountain View, CA USA
来源
关键词
Hierarchical RL; Multi-task RL; Imitation Learning;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present relay policy learning, a method for imitation and reinforcement learning that can solve multi-stage, long-horizon robotic tasks. This general and universally-applicable, two-phase approach consists of an imitation learning stage resulting in goal-conditioned hierarchical policies that can be easily improved using fine-tuning via reinforcement learning in the subsequent phase. Our method, while not necessarily perfect at imitation learning, is very amenable to further improvement via environment interaction allowing it to scale to challenging long-horizon tasks. In particular, we simplify the long-horizon policy learning problem by using a novel data-relabeling algorithm for learning goal-conditioned hierarchical policies, where the low-level only acts for a fixed number of steps, regardless of the goal achieved. While we rely on demonstration data to bootstrap policy learning, we do not assume access to demonstrations of specific tasks. Instead, our approach can leverage unstructured and unsegmented demonstrations of semantically meaningful behaviors that are not only less burdensome to provide, but also can greatly facilitate further improvement using reinforcement learning. We demonstrate the effectiveness of our method on a number of multi-stage, long-horizon manipulation tasks in a challenging kitchen simulation environment.
引用
下载
收藏
页数:13
相关论文
共 50 条
  • [1] The Art of Imitation: Learning Long-Horizon Manipulation Tasks From Few Demonstrations
    Von Hartz, Jan Ole
    Welschehold, Tim
    Valada, Abhinav
    Boedecker, Joschka
    IEEE Robotics and Automation Letters, 2024, 9 (12) : 11369 - 11376
  • [2] Uncertainty-aware hierarchical reinforcement learning for long-horizon tasks
    Hu, Wenning
    Wang, Hongbin
    He, Ming
    Wang, Nianbin
    APPLIED INTELLIGENCE, 2023, 53 (23) : 28555 - 28569
  • [3] Uncertainty-aware hierarchical reinforcement learning for long-horizon tasks
    Wenning Hu
    Hongbin Wang
    Ming He
    Nianbin Wang
    Applied Intelligence, 2023, 53 : 28555 - 28569
  • [4] Enhancing construction robot learning for collaborative and long-horizon tasks using generative adversarial imitation learning
    Li, Rui
    Zou, Zhengbo
    ADVANCED ENGINEERING INFORMATICS, 2023, 58
  • [5] Skill Learning for Long-Horizon Sequential Tasks
    Alves, Joao
    Lau, Nuno
    Silva, Filipe
    PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2022, 2022, 13566 : 713 - 724
  • [6] Disturbance Injection Under Partial Automation: Robust Imitation Learning for Long-Horizon Tasks
    Tahara, Hirotaka
    Sasaki, Hikaru
    Oh, Hanbit
    Anarossi, Edgar
    Matsubara, Takamitsu
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (05) : 2724 - 2731
  • [7] Learning a Skill-sequence-dependent Policy for Long-horizon Manipulation Tasks
    Li, Zhihao
    Sun, Zhenglong
    Su, Jionglong
    Zhang, Jiaming
    2021 IEEE 17TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2021, : 1229 - 1234
  • [8] Hierarchical Learning from Demonstrations for Long-Horizon Tasks
    Li, Boyao
    Li, Jiayi
    Lu, Tao
    Cai, Yinghao
    Wang, Shuo
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 4545 - 4551
  • [9] MimicPlay: Long-Horizon Imitation Learning by Watching Human Play
    Wang, Chen
    Fan, Linxi
    Sun, Jiankai
    Zhang, Ruohan
    Li Fei-Fei
    Xu, Danfei
    Zhu, Yuke
    Anandkumar, Anima
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [10] ISPIL: Interactive Sub-Goal-Planning Imitation Learning for Long-Horizon Tasks With Diverse Goals
    Ochoa, Cynthia
    Oh, Hanbit
    Kwon, Yuhwan
    Domae, Yukiyasu
    Matsubara, Takamitsu
    IEEE Access, 2024, 12 : 197616 - 197631