Reward Learning from Narrated Demonstrations

被引:4
|
作者
Tung, Hsiao-Yu [1 ]
Harley, Adam W. [1 ]
Huang, Liang-Kang [1 ]
Fragkiadaki, Katerina [1 ]
机构
[1] Carnegie Mellon Univ, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
关键词
D O I
10.1109/CVPR.2018.00732
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humans effortlessly " program" one another by communicating goals and desires in natural language. In contrast, humans program robotic behaviours by indicating desired object locations and poses to be achieved , by providing RGB images of goal configurations or supplying a demonstration to be imitated . None of these methods generalize across environment variations, and they convey the goal in awkward technical terms. This work proposes joint learning of natural language grounding and instructable behavioural policies reinforced by perceptual detectors of natural language expressions, grounded to the sensory inputs of the robotic agent. Our supervision is narrated visual demonstrations (NVD), which are visual demonstrations paired with verbal narration (as opposed to being silent). We introduce a dataset of NVD where teachers perform activities while describing them in detail. We map the teachers' descriptions to perceptual reward detectors, and use them to train corresponding behavioural policies in simulation. We empirically show that our instructable agents (i) learn visual reward detectors using a small number of examples by exploiting hard negative mined configurations from demonstration dynamics, (ii) develop pick- and- place policies using learned visual reward detectors, (iii) benefit from object- factorized state representations that mimic the syntactic structure of natural language goal expressions, and (iv) can execute behaviours that involve novel objects in novel locations at test time, instructed by natural language.
引用
收藏
页码:7004 / 7013
页数:10
相关论文
共 50 条
  • [21] Learning to Generalize from Demonstrations
    Browne, Katie
    Nicolescu, Monica
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2012, 12 (03) : 27 - 38
  • [22] Learning from Corrective Demonstrations
    Gutierrez, Reymundo A.
    Short, Elaine Schaertl
    Niekum, Scott
    Thomaz, Andrea L.
    HRI '19: 2019 14TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2019, : 712 - 714
  • [23] Unsupervised Learning from Narrated Instruction Videos
    Alayrac, Jean-Baptiste
    Bojanowski, Piotr
    Agrawal, Nishant
    Sivic, Josef
    Laptev, Ivan
    Lacoste-Julien, Simon
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4575 - 4583
  • [24] Joint Estimation of Expertise and Reward Preferences From Human Demonstrations
    Carreno-Medrano, Pamela
    Smith, Stephen L.
    Kulic, Dana
    IEEE TRANSACTIONS ON ROBOTICS, 2023, 39 (01) : 681 - 698
  • [25] Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations
    Wilcox, Albert
    Balakrishna, Ashwin
    Dedieu, Jules
    Benslimane, Wyame
    Brown, Daniel S.
    Goldberg, Ken
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [26] Learning Options for an MDP from Demonstrations
    Tamassia, Marco
    Zambetta, Fabio
    Raffe, William
    Li, Xiaodong
    ARTIFICIAL LIFE AND COMPUTATIONAL INTELLIGENCE, 2015, 8955 : 226 - 242
  • [27] Learning Task Priorities From Demonstrations
    Silverio, Joao
    Calinon, Sylvain
    Rozo, Leonel
    Caldwell, Darwin G.
    IEEE TRANSACTIONS ON ROBOTICS, 2019, 35 (01) : 78 - 94
  • [28] Robot Learning to Paint from Demonstrations
    Park, Younghyo
    Jeon, Seunghun
    Lee, Taeyoon
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 3053 - 3060
  • [29] Learning Task Specifications from Demonstrations
    Vazquez-Chanlatte, Marcell
    Jha, Susmit
    Tiwari, Ashish
    Ho, Mark K.
    Seshia, Sanjit A.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [30] Learning Temporal Dynamics from Cycles in Narrated Video
    Epstein, Dave
    Wu, Jiajun
    Schmid, Cordelia
    Sun, Chen
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1460 - 1469