Object Graph Networks for Spatial Language Grounding

被引:6
|
作者
Hawkins, Philip [1 ]
Maire, Frederic [1 ]
Denman, Simon [1 ]
Baktashmotlagh, Mahsa [2 ]
机构
[1] Queensland Univ Technol, Elect Engn Comp Sci Sch, Brisbane, Qld, Australia
[2] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld, Australia
关键词
D O I
10.1109/dicta47822.2019.8946101
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Consider a domestic robot being asked to pick up "the cup nearest to the plate". Natural language is an intuitive way for humans to interact with robots. However, enabling robots to comprehend natural language, and correctly interpret spatial references, is challenging for two reasons. Firstly, phrases must be semantically represented in structures that can be processed computationally; secondly correspondences must be found to map these structures to models that represent objects, relationships and actions in the environment. Recently neural networks have demonstrated a strong potential to address both challenges, most notably in the context of Visual Question Answering (VQA) where they have performed well at answering natural language questions about images. However, the state-of-the-art networks for VQA tasks are not directly applicable to robotic applications. They do not support interfaces suitable for integration with a robotic system and most have a limited capacity to interpret spatial phrases. In this paper we present a neural network architecture trained on synthetic data and evaluated on synthetic and real data. It correctly interprets referring spatial relationships in phrases such as the one above and provides a modular interface that allows a robot to localise an object in the environment from such a phrase.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [1] Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
    Chen, Shizhe
    Guhur, Pierre-Louis
    Tapaswi, Makarand
    Schmid, Cordelia
    Laptev, Ivan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Object schemas for grounding language in a responsive robot
    Hsiao, Kai-yuh
    Tellex, Stefanie
    Vosoughi, Soroush
    Kubat, Rony
    Roy, Deb
    [J]. CONNECTION SCIENCE, 2008, 20 (04) : 253 - 276
  • [3] Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition
    de Amorim, Cleison Correia
    Macedo, David
    Zanchettin, Cleber
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: WORKSHOP AND SPECIAL SESSIONS, 2019, 11731 : 646 - 657
  • [4] Language-Based Sensing Descriptors for Robot Object Grounding
    Gemignani, Guglielmo
    Veloso, Manuela
    Nardi, Daniele
    [J]. ROBOCUP 2015: ROBOT WORLD CUP XIX, 2015, 9513 : 3 - 15
  • [5] Improving Weakly Supervised Scene Graph Parsing through Object Grounding
    Zhang, Yizhou
    Zheng, Zhaoheng
    Nevatia, Ram
    Liu, Yan
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4058 - 4064
  • [6] A Multimodal Connectionist Architecture for Unsupervised Grounding of Spatial Language
    Vavrecka, Michal
    Farkas, Igor
    [J]. COGNITIVE COMPUTATION, 2014, 6 (01) : 101 - 112
  • [7] Grounding spatial language in perception: An empirical and computational investigation
    Regier, T
    Carlson, LA
    [J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY-GENERAL, 2001, 130 (02) : 273 - 298
  • [8] Grounding Abstract Spatial Concepts for Language Interaction with Robots
    Paul, Rohan
    Arkin, Jacob
    Roy, Nicholas
    Howard, Thomas M.
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4929 - 4933
  • [9] A Multimodal Connectionist Architecture for Unsupervised Grounding of Spatial Language
    Michal Vavrečka
    Igor Farkaš
    [J]. Cognitive Computation, 2014, 6 : 101 - 112
  • [10] Graph Networks for Multiple Object Tracking
    Li, Jiahe
    Gao, Xu
    Jiang, Tingting
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 708 - 717