Interpreting Natural Language Instructions Using Language, Vision, and Behavior

被引:3
|
作者
Benotti, Luciana [1 ,2 ]
Lau, Tessa [3 ]
Villalba, Martin [1 ,4 ]
机构
[1] Univ Nacl Cordoba, Cordoba, Argentina
[2] Consejo Nacl Invest Cient & Tecn, Buenos Aires, DF, Argentina
[3] Savioke Inc, Sunnyvale, CA USA
[4] Univ Potsdam, D-14476 Potsdam, Germany
关键词
Design; Algorithms; Performance; Natural language interpretation; multimodal understanding; action recognition; visual feedback; situated virtual agent; unsupervised learning;
D O I
10.1145/2629632
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Robot motion behavior training by natural language instructions
    Nie, Xian-Li
    Jiang, Ping
    Chen, Hui-Tang
    Jiqiren/Robot, 2002, 24 (03):
  • [2] Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
    Anderson, Peter
    Wu, Qi
    Teney, Damien
    Bruce, Jake
    Johnson, Mark
    Sunderhauf, Niko
    Reid, Ian
    Gould, Stephen
    van den Hengel, Anton
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3674 - 3683
  • [3] On the Evaluation of Vision-and-Language Navigation Instructions
    Zhao, Ming
    Anderson, Peter
    Jain, Vihan
    Wang, Su
    Ku, Alexander
    Baldridge, Jason
    Ie, Eugene
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1302 - 1316
  • [4] Pragmatic overloading in natural language instructions
    DiEugenio, B
    Webber, BL
    INTERNATIONAL JOURNAL OF EXPERT SYSTEMS, 1996, 9 (01): : 53 - 84
  • [5] INTERPRETING QUANTIFICATION IN NATURAL-LANGUAGE
    HORNSTEIN, N
    SYNTHESE, 1984, 59 (02) : 117 - 150
  • [6] CHARTDIALOGS: Plotting from Natural Language Instructions
    Shao, Yutong
    Nakashole, Ndapa
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3559 - 3574
  • [7] Conditional Driving from Natural Language Instructions
    Roh, Junha
    Paxton, Chris
    Pronobis, Andrzej
    Farhadi, Ali
    Fox, Dieter
    CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
  • [8] Reading Comprehension of Natural Language Instructions by Robots
    Markievicz, Irena
    Tamosiunaite, Minija
    Vitkute-Adzgauskiene, Daiva
    Kapociute-Dzikiene, Jurgita
    Valteryte, Rita
    Krilavicius, Tomas
    BEYOND DATABASES, ARCHITECTURES AND STRUCTURES: TOWARDS EFFICIENT SOLUTIONS FOR DATA ANALYSIS AND KNOWLEDGE REPRESENTATION, 2017, 716 : 288 - 301
  • [9] Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)
    Toneva, Mariya
    Wehbe, Leila
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [10] End User Programing of Intelligent Agents Using Demonstrations and Natural Language Instructions
    Li, Toby Jia-Jun
    PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES: COMPANION (IUI 2019), 2019, : 143 - 144