INTERPRETABLE VISUAL QUESTION ANSWERING VIA REASONING SUPERVISION

被引:2
|
作者
Parelli, Maria [1 ,2 ]
Mallis, Dimitrios [1 ]
Diomataris, Markos [1 ,2 ]
Pitsikalis, Vassilis [1 ]
机构
[1] DeepLab, Athens, Greece
[2] Swiss Fed Inst Technol, Zurich, Switzerland
关键词
Visual Question Answering; Visual Grounding; Interpretability; Attention Similarity;
D O I
10.1109/ICIP49359.2023.10223156
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer-based architectures have recently demonstrated remarkable performance in the Visual Question Answering (VQA) task. However, such models are likely to disregard crucial visual cues and often rely on multimodal shortcuts and inherent biases of the language modality to predict the correct answer, a phenomenon commonly referred to as lack of visual grounding. In this work, we alleviate this shortcoming through a novel architecture for visual question answering that leverages common sense reasoning as a supervisory signal. Reasoning supervision takes the form of a textual justification of the correct answer, with such annotations being already available on large-scale Visual Common Sense Reasoning (VCR) datasets. The model's visual attention is guided toward important elements of the scene through a similarity loss that aligns the learned attention distributions guided by the question and the correct reasoning. We demonstrate both quantitatively and qualitatively that the proposed approach can boost the model's visual perception capability and lead to performance increase, without requiring training on explicit grounding annotations.
引用
收藏
页码:2525 / 2529
页数:5
相关论文
共 50 条
  • [21] Coarse-to-Fine Reasoning for Visual Question Answering
    Nguyen, Binh X.
    Tuong Do
    Huy Tran
    Tjiputra, Erman
    Tran, Quang D.
    Anh Nguyen
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4557 - 4565
  • [22] Multimodal Knowledge Reasoning for Enhanced Visual Question Answering
    Hussain, Afzaal
    Maqsood, Ifrah
    Shahzad, Muhammad
    Fraz, Muhammad Moazam
    2022 16TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS, SITIS, 2022, : 224 - 230
  • [23] Relational reasoning and adaptive fusion for visual question answering
    Shen, Xiang
    Han, Dezhi
    Zong, Liang
    Guo, Zihan
    Hua, Jie
    APPLIED INTELLIGENCE, 2024, 54 (06) : 5062 - 5080
  • [24] MUREL: Multimodal Relational Reasoning for Visual Question Answering
    Cadene, Remi
    Ben-younes, Hedi
    Cord, Matthieu
    Thome, Nicolas
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1989 - 1998
  • [25] Maintaining Reasoning Consistency in Compositional Visual Question Answering
    Jing, Chenchen
    Jia, Yunde
    Wu, Yuwei
    Liu, Xinyu
    Wu, Qi
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5089 - 5098
  • [26] A DIAGNOSTIC STUDY OF VISUAL QUESTION ANSWERING WITH ANALOGICAL REASONING
    Huang, Ziqi
    Zhu, Hongyuan
    Sun, Ying
    Choi, Dongkyu
    Tan, Cheston
    Lim, Joo-Hwee
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2463 - 2467
  • [27] Visual Question Answering Research on Joint Knowledge and Visual Information Reasoning
    Su, Zhenqiang
    Gou, Gang
    Computer Engineering and Applications, 2024, 60 (05) : 95 - 102
  • [28] Interpretable Complex Question Answering
    Chakrabarti, Soumen
    WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 2455 - 2457
  • [29] Visual-Guided Reasoning Path Generation for Visual Question Answering
    Liu, Xinyu
    Jing, Chenchen
    Zhang, Mingliang
    Wu, Yuwei
    Jia, Yunde
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT 1, 2025, 15031 : 167 - 180
  • [30] Detection-Based Intermediate Supervision For Visual Question Answering
    Liu, Yuhang
    Peng, Daowan
    Wei, Wei
    Fu, Yuanyuan
    Xie, Wenfeng
    Chen, Dangyang
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 14061 - 14068