A DIAGNOSTIC STUDY OF VISUAL QUESTION ANSWERING WITH ANALOGICAL REASONING

被引:1
|
作者
Huang, Ziqi [1 ]
Zhu, Hongyuan [2 ]
Sun, Ying [2 ]
Choi, Dongkyu [3 ]
Tan, Cheston [2 ]
Lim, Joo-Hwee [1 ,2 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] ASTAR, I2R, Singapore, Singapore
[3] ASTAR, IHPC, Singapore, Singapore
关键词
analogical reasoning; visual reasoning; Visual Question Answering (VQA); synthetic dataset; benchmark;
D O I
10.1109/ICIP42928.2021.9506539
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The deep learning community has made rapid progress in low-level visual perception tasks such as object localization, detection and segmentation. However, for tasks such as Visual Question Answering (VQA) and visual language grounding that require high-level reasoning abilities, huge gaps still exist between artificial systems and human intelligence. In this work, we perform a diagnostic study on recent popular VQA in terms of analogical reasoning. We term it as Analogical VQA, where a system needs to reason on a group of images to find analogical relations among them in order to correctly answer a natural language question. To study the task in depth, we propose an initial diagnostic synthetic dataset CLEVR-Analogy, which tests a range of analogical reasoning abilities (e.g. reasoning on object attributes, spatial relationships, existence, and arithmetic analogies). We benchmark various recent state-of-the-art methods on our dataset and compare the results against human performance, and discover that existing systems fall shorts when facing analogical reasoning involving spatial relationships. The dataset and code will be publicly available to facilitate future research.
引用
收藏
页码:2463 / 2467
页数:5
相关论文
共 50 条
  • [41] A question-guided multi-hop reasoning graph network for visual question answering
    Xu, Zhaoyang
    Gu, Jinguang
    Liu, Maofu
    Zhou, Guangyou
    Fu, Haidong
    Qiu, Chen
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
  • [42] DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering
    Wang, Jianyu
    Bao, Bing-Kun
    Xu, Changsheng
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 24 : 3369 - 3380
  • [43] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
  • [44] Reasoning on the Relation: Enhancing Visual Representation for Visual Question Answering and Cross-Modal Retrieval
    Yu, Jing
    Zhang, Weifeng
    Lu, Yuhang
    Qin, Zengchang
    Hu, Yue
    Tan, Jianlong
    Wu, Qi
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3196 - 3209
  • [45] DMRFNet: Deep Multimodal Reasoning and Fusion for Visual Question Answering and explanation generation
    Zhang, Weifeng
    Yu, Jing
    Zhao, Wenhong
    Ran, Chuan
    [J]. Information Fusion, 2021, 72 : 70 - 79
  • [46] Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering
    Narasimhan, Medhini
    Lazebnik, Svetlana
    Schwing, Alexander G.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [47] Graphhopper: Multi-hop Scene Graph Reasoning for Visual Question Answering
    Koner, Rajat
    Li, Hang
    Hildebrandt, Marcel
    Das, Deepan
    Tresp, Volker
    Guennemann, Stephan
    [J]. SEMANTIC WEB - ISWC 2021, 2021, 12922 : 111 - 127
  • [48] DMRFNet: Deep Multimodal Reasoning and Fusion for Visual Question Answering and explanation generation
    Zhang, Weifeng
    Yu, Jing
    Zhao, Wenhong
    Ran, Chuan
    [J]. INFORMATION FUSION, 2021, 72 : 70 - 79
  • [49] Efficient Multi-step Reasoning Attention Network for Visual Question Answering
    Zhang, Haotian
    Wu, Wei
    Zhang, Meng
    [J]. THIRTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2021), 2022, 12083
  • [50] Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets
    Abdulganiyu Abdu Yusuf
    Feng Chong
    Mao Xianling
    [J]. Multimedia Tools and Applications, 2022, 81 : 40361 - 40370