Graph-Structured Representations for Visual Question Answering

被引：227

作者：

Teney, Damien ^{[1
]}

Liu, Lingqiao ^{[1
]}

van den Hengel, Anton ^{[1
]}

机构：

[1] Univ Adelaide, Australian Ctr Visual Technol, Adelaide, SA, Australia

来源：

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年

关键词：

D O I：

10.1109/CVPR.2017.344

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes to improve visual question answering (VQA) with structured representations of both scene contents and questions. A key challenge in VQA is to require joint reasoning over the visual and text domains. The predominant CNN/LSTM-based approach to VQA is limited by monolithic vector representations that largely ignore structure in the scene and in the question. CNN feature vectors cannot effectively capture situations as simple as multiple object instances, and LSTMs process questions as series of words, which do not reflect the true complexity of language structure. We instead propose to build graphs over the scene objects and over the question words, and we describe a deep neural network that exploits the structure in these representations. We show that this approach achieves significant improvements over the state-of-the-art, increasing accuracy from 71.2% to 74.4% on the "abstract scenes" multiple-choice benchmark, and from 34.7% to 39.1% for the more challenging "balanced" scenes, i. e. image pairs with fine-grained differences and opposite yes/no answers to a same question.

引用

页码：3233 / 3241

页数：9

共 50 条

[1] The case for graph-structured representations
Sanders, KE
Kettler, BP
Hendler, JA
[J]. CASE-BASED REASONING RESEARCH AND DEVELOPMENT, 1997, 1266 : 245 - 254
[2] Graph-Structured Visual Imitation
Sieb, Maximilian
Xian, Zhou
Huang, Audrey
Kroemer, Oliver
Fragkiadaki, Katerina
[J]. CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
[3] Nested graph-structured representations for cases
Macedo, L
Cardoso, A
[J]. ADVANCES IN CASE-BASED REASONING, 1998, 1488 : 1 - 12
[4] SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering
Xiong, Peixi
You, Quanzeng
Yu, Pei
Liu, Zicheng
Wu, Ying
[J]. arXiv, 2022,
[5] Structured Attentions for Visual Question Answering
Zhu, Chen
Zhao, Yanpeng
Huang, Shuaiyi
Tu, Kewei
Ma, Yi
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1300 - 1309
[6] GS-CBR-KBQA: Graph-structured case-based reasoning for knowledge base question answering
Li, Jiecheng
Luo, Xudong
Lu, Guangquan
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 257
[7] Graph-structured multitask sparsity model for visual tracking
Sun, Jun
Chen, Qidong
Sun, Jianan
Zhang, Tao
Fang, Wei
Wu, Xiaojun
[J]. INFORMATION SCIENCES, 2019, 486 : 133 - 147
[8] Visual Question Answering with Textual Representations for Images
Hirota, Yusuke
Garcia, Noa
Otani, Mayu
Chu, Chenhui
Nakashima, Yuta
Taniguchi, Ittetsu
Onoye, Takao
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3147 - 3150
[9] STRUCTURED SEMANTIC REPRESENTATION FOR VISUAL QUESTION ANSWERING
Yu, Dongchen
Gao, Xing
Xiong, Hongkai
[J]. 2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2286 - 2290
[10] Graph Strategy for Interpretable Visual Question Answering
Sarkisyan, Christina
Savelov, Mikhail
Kovalev, Alexey K.
Panov, Aleksandr I.
[J]. ARTIFICIAL GENERAL INTELLIGENCE, AGI 2022, 2023, 13539 : 86 - 99

← 1 2 3 4 5 →