INTERPRETABLE VISUAL QUESTION ANSWERING VIA REASONING SUPERVISION

被引：1

作者：

Parelli, Maria ^{[1
,2
]}

Mallis, Dimitrios ^{[1
]}

Diomataris, Markos ^{[1
,2
]}

Pitsikalis, Vassilis ^{[1
]}

机构：

[1] DeepLab, Athens, Greece

[2] Swiss Fed Inst Technol, Zurich, Switzerland

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年

关键词：

Visual Question Answering; Visual Grounding; Interpretability; Attention Similarity;

D O I：

10.1109/ICIP49359.2023.10223156

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformer-based architectures have recently demonstrated remarkable performance in the Visual Question Answering (VQA) task. However, such models are likely to disregard crucial visual cues and often rely on multimodal shortcuts and inherent biases of the language modality to predict the correct answer, a phenomenon commonly referred to as lack of visual grounding. In this work, we alleviate this shortcoming through a novel architecture for visual question answering that leverages common sense reasoning as a supervisory signal. Reasoning supervision takes the form of a textual justification of the correct answer, with such annotations being already available on large-scale Visual Common Sense Reasoning (VCR) datasets. The model's visual attention is guided toward important elements of the scene through a similarity loss that aligns the learned attention distributions guided by the question and the correct reasoning. We demonstrate both quantitatively and qualitatively that the proposed approach can boost the model's visual perception capability and lead to performance increase, without requiring training on explicit grounding annotations.

引用

页码：2525 / 2529

页数：5

共 50 条

[1] Interpretable Visual Question Answering by Reasoning on Dependency Trees
Cao, Qingxing
Liang, Xiaodan
Li, Bailin
Lin, Liang
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (03) : 887 - 901
[2] Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining
Zhang, Yundong
Niebles, Juan Carlos
Soto, Alvaro
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 349 - 357
[3] Medical Visual Question Answering via Conditional Reasoning
Zhan, Li-Ming
Liu, Bo
Fan, Lu
Chen, Jiaxin
Wu, Xiao-Ming
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2345 - 2354
[4] WeaQA: Weak Supervision via Captions for Visual Question Answering
Banerjee, Pratyay
Gokhale, Tejas
Yang, Yezhou
Baral, Chitta
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3420 - 3435
[5] Sequential Visual Reasoning for Visual Question Answering
Liu, Jinlai
Wu, Chenfei
Wang, Xiaojie
Dong, Xuan
PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 410 - 415
[6] Graph Strategy for Interpretable Visual Question Answering
Sarkisyan, Christina
Savelov, Mikhail
Kovalev, Alexey K.
Panov, Aleksandr I.
ARTIFICIAL GENERAL INTELLIGENCE, AGI 2022, 2023, 13539 : 86 - 99
[7] Chain of Reasoning for Visual Question Answering
Wu, Chenfei
Liu, Jinlai
Wang, Xiaojie
Dong, Xuan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[8] Medical Visual Question Answering via Conditional Reasoning and Contrastive Learning
Liu, Bo
Zhan, Li-Ming
Xu, Li
Wu, Xiao-Ming
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2023, 42 (05) : 1532 - 1545
[9] NOAHQA: Numerical Reasoning with Interpretable Graph Question Answering Dataset
Zhang, Qiyuan
Wang, Lei
Yu, Sicheng
Wang, Shuohang
Wang, Yang
Jiang, Jing
Lim, Ee-Ping
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 4147 - 4161
[10] PRIOR VISUAL RELATIONSHIP REASONING FOR VISUAL QUESTION ANSWERING
Yang, Zhuoqian
Qin, Zengchang
Yu, Jing
Wan, Tao
2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1411 - 1415

← 1 2 3 4 5 →