INTERPRETABLE VISUAL QUESTION ANSWERING VIA REASONING SUPERVISION

被引：2

作者：

Parelli, Maria ^{[1
,2
]}

Mallis, Dimitrios ^{[1
]}

Diomataris, Markos ^{[1
,2
]}

Pitsikalis, Vassilis ^{[1
]}

机构：

[1] DeepLab, Athens, Greece

[2] Swiss Fed Inst Technol, Zurich, Switzerland

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年

关键词：

Visual Question Answering; Visual Grounding; Interpretability; Attention Similarity;

D O I：

10.1109/ICIP49359.2023.10223156

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformer-based architectures have recently demonstrated remarkable performance in the Visual Question Answering (VQA) task. However, such models are likely to disregard crucial visual cues and often rely on multimodal shortcuts and inherent biases of the language modality to predict the correct answer, a phenomenon commonly referred to as lack of visual grounding. In this work, we alleviate this shortcoming through a novel architecture for visual question answering that leverages common sense reasoning as a supervisory signal. Reasoning supervision takes the form of a textual justification of the correct answer, with such annotations being already available on large-scale Visual Common Sense Reasoning (VCR) datasets. The model's visual attention is guided toward important elements of the scene through a similarity loss that aligns the learned attention distributions guided by the question and the correct reasoning. We demonstrate both quantitatively and qualitatively that the proposed approach can boost the model's visual perception capability and lead to performance increase, without requiring training on explicit grounding annotations.

引用

页码：2525 / 2529

页数：5

共 50 条

[41] Comprehensive-perception dynamic reasoning for visual question answering
Shuang, Kai
Guo, Jinyu
Wang, Zihan
PATTERN RECOGNITION, 2022, 131
[42] Semantic Relation Graph Reasoning Network for Visual Question Answering
Lan, Hong
Zhang, Pufen
TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2021, 11719
[43] Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
Banerjee, Pratyay
Gokhale, Tejas
Yang, Yezhou
Baral, Chitta
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1888 - 1898
[44] Research on Visual Question Answering Based on GAT Relational Reasoning
Miao, Yalin
Cheng, Wenfang
He, Shuyun
Jiang, Hui
NEURAL PROCESSING LETTERS, 2022, 54 (02) : 1435 - 1448
[45] Research on Visual Question Answering Based on GAT Relational Reasoning
Yalin Miao
Wenfang Cheng
Shuyun He
Hui Jiang
Neural Processing Letters, 2022, 54 : 1435 - 1448
[46] VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
Chen, Kang
Wu, Xiangqian
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27208 - 27217
[47] Explainable Knowledge reasoning via thought chains for knowledge-based visual question answering
Qiu, Chen
Xie, Zhiqiang
Liu, Maofu
Hu, Huijun
INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (04)
[48] Debiased Visual Question Answering via the perspective of question types
Huai, Tianyu
Yang, Shuwen
Zhang, Junhang
Zhao, Jiabao
He, Liang
PATTERN RECOGNITION LETTERS, 2024, 178 : 181 - 187
[49] Coarse and Fine Granularity Graph Reasoning for Interpretable Multi-Hop Question Answering
Zhang, Min
Li, Feng
Wang, Yang
Zhang, Zequn
Zhou, Yanhai
Li, Xiaoyu
IEEE ACCESS, 2020, 8 : 56755 - 56765
[50] Exploring Human-Like Attention Supervision in Visual Question Answering
Qiao, Tingting
Dong, Jianfeng
Xu, Duanqing
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7300 - 7307

← 1 2 3 4 5 →