INTERPRETABLE VISUAL QUESTION ANSWERING VIA REASONING SUPERVISION

被引:2
|
作者
Parelli, Maria [1 ,2 ]
Mallis, Dimitrios [1 ]
Diomataris, Markos [1 ,2 ]
Pitsikalis, Vassilis [1 ]
机构
[1] DeepLab, Athens, Greece
[2] Swiss Fed Inst Technol, Zurich, Switzerland
关键词
Visual Question Answering; Visual Grounding; Interpretability; Attention Similarity;
D O I
10.1109/ICIP49359.2023.10223156
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer-based architectures have recently demonstrated remarkable performance in the Visual Question Answering (VQA) task. However, such models are likely to disregard crucial visual cues and often rely on multimodal shortcuts and inherent biases of the language modality to predict the correct answer, a phenomenon commonly referred to as lack of visual grounding. In this work, we alleviate this shortcoming through a novel architecture for visual question answering that leverages common sense reasoning as a supervisory signal. Reasoning supervision takes the form of a textual justification of the correct answer, with such annotations being already available on large-scale Visual Common Sense Reasoning (VCR) datasets. The model's visual attention is guided toward important elements of the scene through a similarity loss that aligns the learned attention distributions guided by the question and the correct reasoning. We demonstrate both quantitatively and qualitatively that the proposed approach can boost the model's visual perception capability and lead to performance increase, without requiring training on explicit grounding annotations.
引用
收藏
页码:2525 / 2529
页数:5
相关论文
共 50 条
  • [41] Comprehensive-perception dynamic reasoning for visual question answering
    Shuang, Kai
    Guo, Jinyu
    Wang, Zihan
    PATTERN RECOGNITION, 2022, 131
  • [42] Semantic Relation Graph Reasoning Network for Visual Question Answering
    Lan, Hong
    Zhang, Pufen
    TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2021, 11719
  • [43] Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
    Banerjee, Pratyay
    Gokhale, Tejas
    Yang, Yezhou
    Baral, Chitta
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1888 - 1898
  • [44] Research on Visual Question Answering Based on GAT Relational Reasoning
    Miao, Yalin
    Cheng, Wenfang
    He, Shuyun
    Jiang, Hui
    NEURAL PROCESSING LETTERS, 2022, 54 (02) : 1435 - 1448
  • [45] Research on Visual Question Answering Based on GAT Relational Reasoning
    Yalin Miao
    Wenfang Cheng
    Shuyun He
    Hui Jiang
    Neural Processing Letters, 2022, 54 : 1435 - 1448
  • [46] VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
    Chen, Kang
    Wu, Xiangqian
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27208 - 27217
  • [47] Explainable Knowledge reasoning via thought chains for knowledge-based visual question answering
    Qiu, Chen
    Xie, Zhiqiang
    Liu, Maofu
    Hu, Huijun
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (04)
  • [48] Debiased Visual Question Answering via the perspective of question types
    Huai, Tianyu
    Yang, Shuwen
    Zhang, Junhang
    Zhao, Jiabao
    He, Liang
    PATTERN RECOGNITION LETTERS, 2024, 178 : 181 - 187
  • [49] Coarse and Fine Granularity Graph Reasoning for Interpretable Multi-Hop Question Answering
    Zhang, Min
    Li, Feng
    Wang, Yang
    Zhang, Zequn
    Zhou, Yanhai
    Li, Xiaoyu
    IEEE ACCESS, 2020, 8 : 56755 - 56765
  • [50] Exploring Human-Like Attention Supervision in Visual Question Answering
    Qiao, Tingting
    Dong, Jianfeng
    Xu, Duanqing
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7300 - 7307