Interpreting Natural Language Instructions Using Language, Vision, and Behavior

被引:3
|
作者
Benotti, Luciana [1 ,2 ]
Lau, Tessa [3 ]
Villalba, Martin [1 ,4 ]
机构
[1] Univ Nacl Cordoba, Cordoba, Argentina
[2] Consejo Nacl Invest Cient & Tecn, Buenos Aires, DF, Argentina
[3] Savioke Inc, Sunnyvale, CA USA
[4] Univ Potsdam, D-14476 Potsdam, Germany
关键词
Design; Algorithms; Performance; Natural language interpretation; multimodal understanding; action recognition; visual feedback; situated virtual agent; unsupervised learning;
D O I
10.1145/2629632
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] BEHAVIOR OF NATURAL-LANGUAGE STRUCTURES IN THE LANGUAGE OF SUBJECT HEADLINES
    RUCHIMSKAYA, EM
    NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1992, (02): : 10 - 14
  • [32] Translating Natural Language Instructions to Computer Programs for Robot Manipulation
    Venkatesh, Sagar Gubbi
    Upadrashta, Raviteja
    Antrutur, Bharadwaj
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 1919 - 1926
  • [33] Few-Shot Text Generation with Natural Language Instructions
    Schick, Timo
    Schuetze, Hinrich
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 390 - 402
  • [34] Spatial Reasoning from Natural Language Instructions for Robot Manipulation
    Venkatesh, Sagar Gubbi
    Biswas, Anirban
    Upadrashta, Raviteja
    Srinivasan, Vikram
    Talukdar, Partha
    Amrutur, Bharadwaj
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 11196 - 11202
  • [35] Hierarchical Decision Making by Generating and Following Natural Language Instructions
    Hu, Hengyuan
    Yarats, Denis
    Gong, Qucheng
    Tian, Yuandong
    Lewis, Mike
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [36] Spatial References and Perspective in Natural Language Instructions for Collaborative Manipulation
    Li, Shen
    Scalise, Rosario
    Admoni, Henny
    Rosenthal, Stephanie
    Srinivasa, Siddhartha S.
    2016 25TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2016, : 44 - 51
  • [37] Natural language instructions for human-robot collaborative manipulation
    Scalise, Rosario
    Li, Shen
    Admoni, Henny
    Rosenthal, Stephanie
    Srinivasa, Siddhartha S.
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2018, 37 (06): : 558 - 565
  • [38] Teaching Robots New Actions through Natural Language Instructions
    She, Lanbo
    Cheng, Yu
    Chai, Joyce Y.
    Jia, Yunyi
    Yang, Shaohua
    Xi, Ning
    2014 23RD IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (IEEE RO-MAN), 2014, : 868 - 873
  • [39] A Model for Verifiable Grounding and Execution of Complex Natural Language Instructions
    Boteanu, Adrian
    Howard, Thomas
    Arkin, Jacob
    Kress-Gazit, Hadas
    2016 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2016), 2016, : 2649 - 2654
  • [40] Natural language instructions induce compositional generalization in networks of neurons
    Riveland, Reidar
    Pouget, Alexandre
    NATURE NEUROSCIENCE, 2024, 27 (05) : 988 - 999