Learning From Sparse Demonstrations

被引：7

作者：

Jin, Wanxin ^{[1
]}

Murphey, Todd D. ^{[2
]}

Kulic, Dana ^{[3
]}

Ezer, Neta ^{[4
]}

Mou, Shaoshuai ^{[5
]}

机构：

[1] Univ Penn, Gen Robot Automat Sensing & Percept Lab, Philadelphia, PA 19104 USA

[2] Northwestern Univ, Dept Mech Engn, Evanston, IL 60208 USA

[3] Monash Univ, Clayton, Vic 3800, Australia

[4] Northrop Grumman Corp, Linthicum Hts, MD 21090 USA

[5] Purdue Univ, Sch Aeronaut & Astronaut, W Lafayette, IN 47906 USA

来源：

IEEE TRANSACTIONS ON ROBOTICS | 2023年 / 39卷 / 01期

基金：

美国国家科学基金会;

关键词：

Robots; Trajectory; Linear programming; Task analysis; Optimal control; Costs; Cost function; Inverse optimal control (IOC); inverse reinforcement learning (IRL); learning from demonstrations (LfD); motion planning; optimal control; Pontryagin differentiable programming (PDP); TIME; MANIPULATION; OPTIMIZATION; ALGORITHM;

D O I：

10.1109/TRO.2022.3191592

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

In this article, we develop the method of continuous Pontryagin differentiable programming (Continuous PDP), which enables a robot to learn an objective function from a few sparsely demonstrated keyframes. The keyframes, labeled with some time stamps, are the desired task-space outputs, which a robot is expected to follow sequentially. The time stamps of the keyframes can be different from the time of the robot's actual execution. The method jointly finds an objective function and a time-warping function such that the robot's resulting trajectory sequentially follows the keyframes with minimal discrepancy loss. The Continuous PDP minimizes the discrepancy loss using projected gradient descent by efficiently solving the gradient of the robot trajectory with respect to the unknown parameters. The method is first evaluated on a simulated robot arm and then applied to a 6-DoF quadrotor to learn an objective function for motion planning in unmodeled environments. The results show the efficiency of the method, its ability to handle time misalignment between keyframes and robot execution, and the generalization of objective learning into unseen motion conditions.

引用

页码：645 / 664

页数：20

共 50 条

[1] Sparse Reward based Manipulator Motion Planning by Using High Speed Learning from Demonstrations
Zuo, Guoyu
Lu, Jiahao
Pan, Tingting
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO), 2018, : 518 - 523
[2] Learning to Generalize from Demonstrations
Browne, Katie
Nicolescu, Monica
[J]. CYBERNETICS AND INFORMATION TECHNOLOGIES, 2012, 12 (03) : 27 - 38
[3] Learning from Corrective Demonstrations
Gutierrez, Reymundo A.
Short, Elaine Schaertl
Niekum, Scott
Thomaz, Andrea L.
[J]. HRI '19: 2019 14TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2019, : 712 - 714
[4] Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward Environments
Rengarajan, Desik
Chaudhary, Sapana
Kim, Jaewon
Kalathil, Dileep
Shakkottai, Srinivas
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[5] Learning Options for an MDP from Demonstrations
Tamassia, Marco
Zambetta, Fabio
Raffe, William
Li, Xiaodong
[J]. ARTIFICIAL LIFE AND COMPUTATIONAL INTELLIGENCE, 2015, 8955 : 226 - 242
[6] Robot Learning to Paint from Demonstrations
Park, Younghyo
Jeon, Seunghun
Lee, Taeyoon
[J]. 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 3053 - 3060
[7] Learning Task Priorities From Demonstrations
Silverio, Joao
Calinon, Sylvain
Rozo, Leonel
Caldwell, Darwin G.
[J]. IEEE TRANSACTIONS ON ROBOTICS, 2019, 35 (01) : 78 - 94
[8] Learning Task Specifications from Demonstrations
Vazquez-Chanlatte, Marcell
Jha, Susmit
Tiwari, Ashish
Ho, Mark K.
Seshia, Sanjit A.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[9] Reward Learning from Narrated Demonstrations
Tung, Hsiao-Yu
Harley, Adam W.
Huang, Liang-Kang
Fragkiadaki, Katerina
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7004 - 7013
[10] Learning from Demonstration without Demonstrations
Blau, Tom
Morere, Philippe
Francis, Gilad
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 4116 - 4122

← 1 2 3 4 5 →