Learning From Sparse Demonstrations

被引:7
|
作者
Jin, Wanxin [1 ]
Murphey, Todd D. [2 ]
Kulic, Dana [3 ]
Ezer, Neta [4 ]
Mou, Shaoshuai [5 ]
机构
[1] Univ Penn, Gen Robot Automat Sensing & Percept Lab, Philadelphia, PA 19104 USA
[2] Northwestern Univ, Dept Mech Engn, Evanston, IL 60208 USA
[3] Monash Univ, Clayton, Vic 3800, Australia
[4] Northrop Grumman Corp, Linthicum Hts, MD 21090 USA
[5] Purdue Univ, Sch Aeronaut & Astronaut, W Lafayette, IN 47906 USA
基金
美国国家科学基金会;
关键词
Robots; Trajectory; Linear programming; Task analysis; Optimal control; Costs; Cost function; Inverse optimal control (IOC); inverse reinforcement learning (IRL); learning from demonstrations (LfD); motion planning; optimal control; Pontryagin differentiable programming (PDP); TIME; MANIPULATION; OPTIMIZATION; ALGORITHM;
D O I
10.1109/TRO.2022.3191592
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
In this article, we develop the method of continuous Pontryagin differentiable programming (Continuous PDP), which enables a robot to learn an objective function from a few sparsely demonstrated keyframes. The keyframes, labeled with some time stamps, are the desired task-space outputs, which a robot is expected to follow sequentially. The time stamps of the keyframes can be different from the time of the robot's actual execution. The method jointly finds an objective function and a time-warping function such that the robot's resulting trajectory sequentially follows the keyframes with minimal discrepancy loss. The Continuous PDP minimizes the discrepancy loss using projected gradient descent by efficiently solving the gradient of the robot trajectory with respect to the unknown parameters. The method is first evaluated on a simulated robot arm and then applied to a 6-DoF quadrotor to learn an objective function for motion planning in unmodeled environments. The results show the efficiency of the method, its ability to handle time misalignment between keyframes and robot execution, and the generalization of objective learning into unseen motion conditions.
引用
收藏
页码:645 / 664
页数:20
相关论文
共 50 条
  • [1] Sparse Reward based Manipulator Motion Planning by Using High Speed Learning from Demonstrations
    Zuo, Guoyu
    Lu, Jiahao
    Pan, Tingting
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO), 2018, : 518 - 523
  • [2] Learning to Generalize from Demonstrations
    Browne, Katie
    Nicolescu, Monica
    [J]. CYBERNETICS AND INFORMATION TECHNOLOGIES, 2012, 12 (03) : 27 - 38
  • [3] Learning from Corrective Demonstrations
    Gutierrez, Reymundo A.
    Short, Elaine Schaertl
    Niekum, Scott
    Thomaz, Andrea L.
    [J]. HRI '19: 2019 14TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2019, : 712 - 714
  • [4] Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward Environments
    Rengarajan, Desik
    Chaudhary, Sapana
    Kim, Jaewon
    Kalathil, Dileep
    Shakkottai, Srinivas
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [5] Learning Options for an MDP from Demonstrations
    Tamassia, Marco
    Zambetta, Fabio
    Raffe, William
    Li, Xiaodong
    [J]. ARTIFICIAL LIFE AND COMPUTATIONAL INTELLIGENCE, 2015, 8955 : 226 - 242
  • [6] Robot Learning to Paint from Demonstrations
    Park, Younghyo
    Jeon, Seunghun
    Lee, Taeyoon
    [J]. 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 3053 - 3060
  • [7] Learning Task Priorities From Demonstrations
    Silverio, Joao
    Calinon, Sylvain
    Rozo, Leonel
    Caldwell, Darwin G.
    [J]. IEEE TRANSACTIONS ON ROBOTICS, 2019, 35 (01) : 78 - 94
  • [8] Learning Task Specifications from Demonstrations
    Vazquez-Chanlatte, Marcell
    Jha, Susmit
    Tiwari, Ashish
    Ho, Mark K.
    Seshia, Sanjit A.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [9] Reward Learning from Narrated Demonstrations
    Tung, Hsiao-Yu
    Harley, Adam W.
    Huang, Liang-Kang
    Fragkiadaki, Katerina
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7004 - 7013
  • [10] Learning from Demonstration without Demonstrations
    Blau, Tom
    Morere, Philippe
    Francis, Gilad
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 4116 - 4122