A DIAGNOSTIC STUDY OF VISUAL QUESTION ANSWERING WITH ANALOGICAL REASONING

被引：1

作者：

Huang, Ziqi ^{[1
]}

Zhu, Hongyuan ^{[2
]}

Sun, Ying ^{[2
]}

Choi, Dongkyu ^{[3
]}

Tan, Cheston ^{[2
]}

Lim, Joo-Hwee ^{[1
,2
]}

机构：

[1] Nanyang Technol Univ, Singapore, Singapore

[2] ASTAR, I2R, Singapore, Singapore

[3] ASTAR, IHPC, Singapore, Singapore

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) | 2021年

关键词：

analogical reasoning; visual reasoning; Visual Question Answering (VQA); synthetic dataset; benchmark;

D O I：

10.1109/ICIP42928.2021.9506539

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The deep learning community has made rapid progress in low-level visual perception tasks such as object localization, detection and segmentation. However, for tasks such as Visual Question Answering (VQA) and visual language grounding that require high-level reasoning abilities, huge gaps still exist between artificial systems and human intelligence. In this work, we perform a diagnostic study on recent popular VQA in terms of analogical reasoning. We term it as Analogical VQA, where a system needs to reason on a group of images to find analogical relations among them in order to correctly answer a natural language question. To study the task in depth, we propose an initial diagnostic synthetic dataset CLEVR-Analogy, which tests a range of analogical reasoning abilities (e.g. reasoning on object attributes, spatial relationships, existence, and arithmetic analogies). We benchmark various recent state-of-the-art methods on our dataset and compare the results against human performance, and discover that existing systems fall shorts when facing analogical reasoning involving spatial relationships. The dataset and code will be publicly available to facilitate future research.

引用

页码：2463 / 2467

页数：5

共 50 条

[41] A question-guided multi-hop reasoning graph network for visual question answering
Xu, Zhaoyang
Gu, Jinguang
Liu, Maofu
Zhou, Guangyou
Fu, Haidong
Qiu, Chen
[J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
[42] DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering
Wang, Jianyu
Bao, Bing-Kun
Xu, Changsheng
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 24 : 3369 - 3380
[43] Question Modifiers in Visual Question Answering
Britton, William
Sarkhel, Somdeb
Venugopal, Deepak
[J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
[44] Reasoning on the Relation: Enhancing Visual Representation for Visual Question Answering and Cross-Modal Retrieval
Yu, Jing
Zhang, Weifeng
Lu, Yuhang
Qin, Zengchang
Hu, Yue
Tan, Jianlong
Wu, Qi
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3196 - 3209
[45] DMRFNet: Deep Multimodal Reasoning and Fusion for Visual Question Answering and explanation generation
Zhang, Weifeng
Yu, Jing
Zhao, Wenhong
Ran, Chuan
[J]. Information Fusion, 2021, 72 : 70 - 79
[46] Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering
Narasimhan, Medhini
Lazebnik, Svetlana
Schwing, Alexander G.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[47] Graphhopper: Multi-hop Scene Graph Reasoning for Visual Question Answering
Koner, Rajat
Li, Hang
Hildebrandt, Marcel
Das, Deepan
Tresp, Volker
Guennemann, Stephan
[J]. SEMANTIC WEB - ISWC 2021, 2021, 12922 : 111 - 127
[48] DMRFNet: Deep Multimodal Reasoning and Fusion for Visual Question Answering and explanation generation
Zhang, Weifeng
Yu, Jing
Zhao, Wenhong
Ran, Chuan
[J]. INFORMATION FUSION, 2021, 72 : 70 - 79
[49] Efficient Multi-step Reasoning Attention Network for Visual Question Answering
Zhang, Haotian
Wu, Wei
Zhang, Meng
[J]. THIRTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2021), 2022, 12083
[50] Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets
Abdulganiyu Abdu Yusuf
Feng Chong
Mao Xianling
[J]. Multimedia Tools and Applications, 2022, 81 : 40361 - 40370

← 1 2 3 4 5 →