Graph-Structured Representations for Visual Question Answering

被引:227
|
作者
Teney, Damien [1 ]
Liu, Lingqiao [1 ]
van den Hengel, Anton [1 ]
机构
[1] Univ Adelaide, Australian Ctr Visual Technol, Adelaide, SA, Australia
关键词
D O I
10.1109/CVPR.2017.344
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes to improve visual question answering (VQA) with structured representations of both scene contents and questions. A key challenge in VQA is to require joint reasoning over the visual and text domains. The predominant CNN/LSTM-based approach to VQA is limited by monolithic vector representations that largely ignore structure in the scene and in the question. CNN feature vectors cannot effectively capture situations as simple as multiple object instances, and LSTMs process questions as series of words, which do not reflect the true complexity of language structure. We instead propose to build graphs over the scene objects and over the question words, and we describe a deep neural network that exploits the structure in these representations. We show that this approach achieves significant improvements over the state-of-the-art, increasing accuracy from 71.2% to 74.4% on the "abstract scenes" multiple-choice benchmark, and from 34.7% to 39.1% for the more challenging "balanced" scenes, i. e. image pairs with fine-grained differences and opposite yes/no answers to a same question.
引用
收藏
页码:3233 / 3241
页数:9
相关论文
共 50 条
  • [1] The case for graph-structured representations
    Sanders, KE
    Kettler, BP
    Hendler, JA
    [J]. CASE-BASED REASONING RESEARCH AND DEVELOPMENT, 1997, 1266 : 245 - 254
  • [2] Graph-Structured Visual Imitation
    Sieb, Maximilian
    Xian, Zhou
    Huang, Audrey
    Kroemer, Oliver
    Fragkiadaki, Katerina
    [J]. CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
  • [3] Nested graph-structured representations for cases
    Macedo, L
    Cardoso, A
    [J]. ADVANCES IN CASE-BASED REASONING, 1998, 1488 : 1 - 12
  • [4] Structured Attentions for Visual Question Answering
    Zhu, Chen
    Zhao, Yanpeng
    Huang, Shuaiyi
    Tu, Kewei
    Ma, Yi
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1300 - 1309
  • [5] GS-CBR-KBQA: Graph-structured case-based reasoning for knowledge base question answering
    Li, Jiecheng
    Luo, Xudong
    Lu, Guangquan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 257
  • [6] Graph-structured multitask sparsity model for visual tracking
    Sun, Jun
    Chen, Qidong
    Sun, Jianan
    Zhang, Tao
    Fang, Wei
    Wu, Xiaojun
    [J]. INFORMATION SCIENCES, 2019, 486 : 133 - 147
  • [7] Visual Question Answering with Textual Representations for Images
    Hirota, Yusuke
    Garcia, Noa
    Otani, Mayu
    Chu, Chenhui
    Nakashima, Yuta
    Taniguchi, Ittetsu
    Onoye, Takao
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3147 - 3150
  • [8] STRUCTURED SEMANTIC REPRESENTATION FOR VISUAL QUESTION ANSWERING
    Yu, Dongchen
    Gao, Xing
    Xiong, Hongkai
    [J]. 2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2286 - 2290
  • [9] Graph Strategy for Interpretable Visual Question Answering
    Sarkisyan, Christina
    Savelov, Mikhail
    Kovalev, Alexey K.
    Panov, Aleksandr I.
    [J]. ARTIFICIAL GENERAL INTELLIGENCE, AGI 2022, 2023, 13539 : 86 - 99
  • [10] Bilinear Graph Networks for Visual Question Answering
    Guo, Dalu
    Xu, Chang
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 1023 - 1034