Learning From Sparse Demonstrations

被引:7
|
作者
Jin, Wanxin [1 ]
Murphey, Todd D. [2 ]
Kulic, Dana [3 ]
Ezer, Neta [4 ]
Mou, Shaoshuai [5 ]
机构
[1] Univ Penn, Gen Robot Automat Sensing & Percept Lab, Philadelphia, PA 19104 USA
[2] Northwestern Univ, Dept Mech Engn, Evanston, IL 60208 USA
[3] Monash Univ, Clayton, Vic 3800, Australia
[4] Northrop Grumman Corp, Linthicum Hts, MD 21090 USA
[5] Purdue Univ, Sch Aeronaut & Astronaut, W Lafayette, IN 47906 USA
基金
美国国家科学基金会;
关键词
Robots; Trajectory; Linear programming; Task analysis; Optimal control; Costs; Cost function; Inverse optimal control (IOC); inverse reinforcement learning (IRL); learning from demonstrations (LfD); motion planning; optimal control; Pontryagin differentiable programming (PDP); TIME; MANIPULATION; OPTIMIZATION; ALGORITHM;
D O I
10.1109/TRO.2022.3191592
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
In this article, we develop the method of continuous Pontryagin differentiable programming (Continuous PDP), which enables a robot to learn an objective function from a few sparsely demonstrated keyframes. The keyframes, labeled with some time stamps, are the desired task-space outputs, which a robot is expected to follow sequentially. The time stamps of the keyframes can be different from the time of the robot's actual execution. The method jointly finds an objective function and a time-warping function such that the robot's resulting trajectory sequentially follows the keyframes with minimal discrepancy loss. The Continuous PDP minimizes the discrepancy loss using projected gradient descent by efficiently solving the gradient of the robot trajectory with respect to the unknown parameters. The method is first evaluated on a simulated robot arm and then applied to a 6-DoF quadrotor to learn an objective function for motion planning in unmodeled environments. The results show the efficiency of the method, its ability to handle time misalignment between keyframes and robot execution, and the generalization of objective learning into unseen motion conditions.
引用
收藏
页码:645 / 664
页数:20
相关论文
共 50 条
  • [11] Robot Learning from Failed Demonstrations
    Grollman, Daniel H.
    Billard, Aude G.
    [J]. INTERNATIONAL JOURNAL OF SOCIAL ROBOTICS, 2012, 4 (04) : 331 - 342
  • [12] Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards
    Zuo, Guoyu
    Zhao, Qishen
    Lu, Jiahao
    Li, Jiangeng
    [J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2020, 17 (01):
  • [13] Learning a Behavioral Repertoire from Demonstrations
    Justesen, Niels
    Gonzalez-Duque, Miguel
    Cabarcas, Daniel
    Mouret, Jean-Baptiste
    Risi, Sebastian
    [J]. 2020 IEEE CONFERENCE ON GAMES (IEEE COG 2020), 2020, : 383 - 390
  • [14] Objective learning from human demonstrations
    Lin, Jonathan Feng-Shun
    Carreno-Medrano, Pamela
    Parsapour, Mahsa
    Sakr, Maram
    Kulic, Dana
    [J]. ANNUAL REVIEWS IN CONTROL, 2021, 51 : 111 - 129
  • [15] Robot Learning from Failed Demonstrations
    Daniel H. Grollman
    Aude G. Billard
    [J]. International Journal of Social Robotics, 2012, 4 : 331 - 342
  • [16] Adversarial Imitation Learning from Incomplete Demonstrations
    Sun, Mingfei
    Xiaojuan
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3513 - 3519
  • [17] Deep Q-Learning from Demonstrations
    Hester, Todd
    Vecerik, Matej
    Pietquin, Olivier
    Lanctot, Marc
    Schaul, Tom
    Piot, Bilal
    Horgan, Dan
    Quan, John
    Sendonaris, Andrew
    Osband, Ian
    Dulac-Arnold, Gabriel
    Agapiou, John
    Leibo, Joel Z.
    Gruslys, Audrunas
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3223 - 3230
  • [18] Robust Imitation Learning from Noisy Demonstrations
    Tangkaratt, Voot
    Charoenphakdee, Nontawat
    Sugiyama, Masashi
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 298 - +
  • [19] Learning Periodic Tasks from Human Demonstrations
    Yang, Jingyun
    Zhang, Junwu
    Settle, Connor
    Rai, Akshara
    Antonova, Rika
    Bohg, Jeannette
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 8658 - 8665
  • [20] Learning Dialog Policies from Weak Demonstrations
    Gordon-Hall, Gabriel
    Gorinski, Philip John
    Cohen, Shay B.
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1394 - 1405