A DIAGNOSTIC STUDY OF VISUAL QUESTION ANSWERING WITH ANALOGICAL REASONING

被引:1
|
作者
Huang, Ziqi [1 ]
Zhu, Hongyuan [2 ]
Sun, Ying [2 ]
Choi, Dongkyu [3 ]
Tan, Cheston [2 ]
Lim, Joo-Hwee [1 ,2 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] ASTAR, I2R, Singapore, Singapore
[3] ASTAR, IHPC, Singapore, Singapore
关键词
analogical reasoning; visual reasoning; Visual Question Answering (VQA); synthetic dataset; benchmark;
D O I
10.1109/ICIP42928.2021.9506539
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The deep learning community has made rapid progress in low-level visual perception tasks such as object localization, detection and segmentation. However, for tasks such as Visual Question Answering (VQA) and visual language grounding that require high-level reasoning abilities, huge gaps still exist between artificial systems and human intelligence. In this work, we perform a diagnostic study on recent popular VQA in terms of analogical reasoning. We term it as Analogical VQA, where a system needs to reason on a group of images to find analogical relations among them in order to correctly answer a natural language question. To study the task in depth, we propose an initial diagnostic synthetic dataset CLEVR-Analogy, which tests a range of analogical reasoning abilities (e.g. reasoning on object attributes, spatial relationships, existence, and arithmetic analogies). We benchmark various recent state-of-the-art methods on our dataset and compare the results against human performance, and discover that existing systems fall shorts when facing analogical reasoning involving spatial relationships. The dataset and code will be publicly available to facilitate future research.
引用
收藏
页码:2463 / 2467
页数:5
相关论文
共 50 条
  • [1] Analogical Reasoning for Answer Ranking in Social Question Answering
    Tu, Xudong
    Feng, Dan
    Wang, Xin-Jing
    Zhang, Lei
    [J]. IEEE INTELLIGENT SYSTEMS, 2012, 27 (05) : 28 - 35
  • [2] Sequential Visual Reasoning for Visual Question Answering
    Liu, Jinlai
    Wu, Chenfei
    Wang, Xiaojie
    Dong, Xuan
    [J]. PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 410 - 415
  • [3] Chain of Reasoning for Visual Question Answering
    Wu, Chenfei
    Liu, Jinlai
    Wang, Xiaojie
    Dong, Xuan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [4] PRIOR VISUAL RELATIONSHIP REASONING FOR VISUAL QUESTION ANSWERING
    Yang, Zhuoqian
    Qin, Zengchang
    Yu, Jing
    Wan, Tao
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1411 - 1415
  • [5] Visual question answering by pattern matching and reasoning
    Zhan, Huayi
    Xiong, Peixi
    Wang, Xin
    Yang, Lan
    [J]. NEUROCOMPUTING, 2022, 467 : 323 - 336
  • [6] Multimodal Learning and Reasoning for Visual Question Answering
    Ilievski, Ilija
    Feng, Jiashi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [7] Improving reasoning with contrastive visual information for visual question answering
    Long, Yu
    Tang, Pengjie
    Wang, Hanli
    Yu, Jian
    [J]. ELECTRONICS LETTERS, 2021, 57 (20) : 758 - 760
  • [8] Coarse-to-Fine Reasoning for Visual Question Answering
    Nguyen, Binh X.
    Tuong Do
    Huy Tran
    Tjiputra, Erman
    Tran, Quang D.
    Anh Nguyen
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4557 - 4565
  • [9] Medical Visual Question Answering via Conditional Reasoning
    Zhan, Li-Ming
    Liu, Bo
    Fan, Lu
    Chen, Jiaxin
    Wu, Xiao-Ming
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2345 - 2354
  • [10] Relational reasoning and adaptive fusion for visual question answering
    Shen, Xiang
    Han, Dezhi
    Zong, Liang
    Guo, Zihan
    Hua, Jie
    [J]. APPLIED INTELLIGENCE, 2024, 54 (06) : 5062 - 5080