INTERPRETABLE VISUAL QUESTION ANSWERING VIA REASONING SUPERVISION

被引:1
|
作者
Parelli, Maria [1 ,2 ]
Mallis, Dimitrios [1 ]
Diomataris, Markos [1 ,2 ]
Pitsikalis, Vassilis [1 ]
机构
[1] DeepLab, Athens, Greece
[2] Swiss Fed Inst Technol, Zurich, Switzerland
关键词
Visual Question Answering; Visual Grounding; Interpretability; Attention Similarity;
D O I
10.1109/ICIP49359.2023.10223156
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer-based architectures have recently demonstrated remarkable performance in the Visual Question Answering (VQA) task. However, such models are likely to disregard crucial visual cues and often rely on multimodal shortcuts and inherent biases of the language modality to predict the correct answer, a phenomenon commonly referred to as lack of visual grounding. In this work, we alleviate this shortcoming through a novel architecture for visual question answering that leverages common sense reasoning as a supervisory signal. Reasoning supervision takes the form of a textual justification of the correct answer, with such annotations being already available on large-scale Visual Common Sense Reasoning (VCR) datasets. The model's visual attention is guided toward important elements of the scene through a similarity loss that aligns the learned attention distributions guided by the question and the correct reasoning. We demonstrate both quantitatively and qualitatively that the proposed approach can boost the model's visual perception capability and lead to performance increase, without requiring training on explicit grounding annotations.
引用
收藏
页码:2525 / 2529
页数:5
相关论文
共 50 条
  • [1] Interpretable Visual Question Answering by Reasoning on Dependency Trees
    Cao, Qingxing
    Liang, Xiaodan
    Li, Bailin
    Lin, Liang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (03) : 887 - 901
  • [2] Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining
    Zhang, Yundong
    Niebles, Juan Carlos
    Soto, Alvaro
    2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 349 - 357
  • [3] Medical Visual Question Answering via Conditional Reasoning
    Zhan, Li-Ming
    Liu, Bo
    Fan, Lu
    Chen, Jiaxin
    Wu, Xiao-Ming
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2345 - 2354
  • [4] WeaQA: Weak Supervision via Captions for Visual Question Answering
    Banerjee, Pratyay
    Gokhale, Tejas
    Yang, Yezhou
    Baral, Chitta
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3420 - 3435
  • [5] Sequential Visual Reasoning for Visual Question Answering
    Liu, Jinlai
    Wu, Chenfei
    Wang, Xiaojie
    Dong, Xuan
    PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 410 - 415
  • [6] Graph Strategy for Interpretable Visual Question Answering
    Sarkisyan, Christina
    Savelov, Mikhail
    Kovalev, Alexey K.
    Panov, Aleksandr I.
    ARTIFICIAL GENERAL INTELLIGENCE, AGI 2022, 2023, 13539 : 86 - 99
  • [7] Chain of Reasoning for Visual Question Answering
    Wu, Chenfei
    Liu, Jinlai
    Wang, Xiaojie
    Dong, Xuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [8] Medical Visual Question Answering via Conditional Reasoning and Contrastive Learning
    Liu, Bo
    Zhan, Li-Ming
    Xu, Li
    Wu, Xiao-Ming
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2023, 42 (05) : 1532 - 1545
  • [9] NOAHQA: Numerical Reasoning with Interpretable Graph Question Answering Dataset
    Zhang, Qiyuan
    Wang, Lei
    Yu, Sicheng
    Wang, Shuohang
    Wang, Yang
    Jiang, Jing
    Lim, Ee-Ping
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 4147 - 4161
  • [10] PRIOR VISUAL RELATIONSHIP REASONING FOR VISUAL QUESTION ANSWERING
    Yang, Zhuoqian
    Qin, Zengchang
    Yu, Jing
    Wan, Tao
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1411 - 1415