Learning From Sparse Demonstrations

被引：7

作者：

Jin, Wanxin ^{[1
]}

Murphey, Todd D. ^{[2
]}

Kulic, Dana ^{[3
]}

Ezer, Neta ^{[4
]}

Mou, Shaoshuai ^{[5
]}

机构：

[1] Univ Penn, Gen Robot Automat Sensing & Percept Lab, Philadelphia, PA 19104 USA

[2] Northwestern Univ, Dept Mech Engn, Evanston, IL 60208 USA

[3] Monash Univ, Clayton, Vic 3800, Australia

[4] Northrop Grumman Corp, Linthicum Hts, MD 21090 USA

[5] Purdue Univ, Sch Aeronaut & Astronaut, W Lafayette, IN 47906 USA

来源：

IEEE TRANSACTIONS ON ROBOTICS | 2023年 / 39卷 / 01期

基金：

美国国家科学基金会;

关键词：

Robots; Trajectory; Linear programming; Task analysis; Optimal control; Costs; Cost function; Inverse optimal control (IOC); inverse reinforcement learning (IRL); learning from demonstrations (LfD); motion planning; optimal control; Pontryagin differentiable programming (PDP); TIME; MANIPULATION; OPTIMIZATION; ALGORITHM;

D O I：

10.1109/TRO.2022.3191592

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

In this article, we develop the method of continuous Pontryagin differentiable programming (Continuous PDP), which enables a robot to learn an objective function from a few sparsely demonstrated keyframes. The keyframes, labeled with some time stamps, are the desired task-space outputs, which a robot is expected to follow sequentially. The time stamps of the keyframes can be different from the time of the robot's actual execution. The method jointly finds an objective function and a time-warping function such that the robot's resulting trajectory sequentially follows the keyframes with minimal discrepancy loss. The Continuous PDP minimizes the discrepancy loss using projected gradient descent by efficiently solving the gradient of the robot trajectory with respect to the unknown parameters. The method is first evaluated on a simulated robot arm and then applied to a 6-DoF quadrotor to learn an objective function for motion planning in unmodeled environments. The results show the efficiency of the method, its ability to handle time misalignment between keyframes and robot execution, and the generalization of objective learning into unseen motion conditions.

引用

页码：645 / 664

页数：20

共 50 条

[11] Robot Learning from Failed Demonstrations
Grollman, Daniel H.
Billard, Aude G.
[J]. INTERNATIONAL JOURNAL OF SOCIAL ROBOTICS, 2012, 4 (04) : 331 - 342
[12] Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards
Zuo, Guoyu
Zhao, Qishen
Lu, Jiahao
Li, Jiangeng
[J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2020, 17 (01):
[13] Learning a Behavioral Repertoire from Demonstrations
Justesen, Niels
Gonzalez-Duque, Miguel
Cabarcas, Daniel
Mouret, Jean-Baptiste
Risi, Sebastian
[J]. 2020 IEEE CONFERENCE ON GAMES (IEEE COG 2020), 2020, : 383 - 390
[14] Objective learning from human demonstrations
Lin, Jonathan Feng-Shun
Carreno-Medrano, Pamela
Parsapour, Mahsa
Sakr, Maram
Kulic, Dana
[J]. ANNUAL REVIEWS IN CONTROL, 2021, 51 : 111 - 129
[15] Robot Learning from Failed Demonstrations
Daniel H. Grollman
Aude G. Billard
[J]. International Journal of Social Robotics, 2012, 4 : 331 - 342
[16] Adversarial Imitation Learning from Incomplete Demonstrations
Sun, Mingfei
Xiaojuan
[J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3513 - 3519
[17] Deep Q-Learning from Demonstrations
Hester, Todd
Vecerik, Matej
Pietquin, Olivier
Lanctot, Marc
Schaul, Tom
Piot, Bilal
Horgan, Dan
Quan, John
Sendonaris, Andrew
Osband, Ian
Dulac-Arnold, Gabriel
Agapiou, John
Leibo, Joel Z.
Gruslys, Audrunas
[J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3223 - 3230
[18] Robust Imitation Learning from Noisy Demonstrations
Tangkaratt, Voot
Charoenphakdee, Nontawat
Sugiyama, Masashi
[J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 298 - +
[19] Learning Periodic Tasks from Human Demonstrations
Yang, Jingyun
Zhang, Junwu
Settle, Connor
Rai, Akshara
Antonova, Rika
Bohg, Jeannette
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 8658 - 8665
[20] Learning Dialog Policies from Weak Demonstrations
Gordon-Hall, Gabriel
Gorinski, Philip John
Cohen, Shay B.
[J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1394 - 1405

← 1 2 3 4 5 →