RUBi: Reducing Unimodal Biases for Visual Question Answering

被引：0

作者：

Cadene, Remi ^{[1
]}

Dancette, Corentin ^{[1
]}

Ben-Younes, Hedi ^{[1
]}

Cord, Matthieu ^{[1
]}

Parikh, Devi ^{[2
,3
]}

机构：

[1] Sorbonne Univ, CNRS, LIP6, 4 Pl Jussieu, F-75005 Paris, France

[2] Facebook AI Res, Menlo Pk, CA 94025 USA

[3] Georgia Inst Technol, Atlanta, GA 30332 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image. It implicitly forces the VQA model to use the two input modalities instead of relying on statistical regularities between the question and the answer. We leverage a question-only model that captures the language biases by identifying when these unwanted regularities are used. It prevents the base VQA model from learning them by influencing its predictions. This leads to dynamically adjusting the loss in order to compensate for biases. We validate our contributions by surpassing the current state-of-the-art results on VQA-CP v2. This dataset is specifically designed to assess the robustness of VQA models when exposed to different question biases at test time than what was seen during training. Our code is available: github.com/cdancette/rubi.bootstrap.pytorch

引用

页数：12

共 50 条

[41] Scene Text Visual Question Answering
Biten, Ali Furkan
Tito, Ruben
Mafla, Andres
Gomez, Lluis
Rusinol, Marcal
Valveny, Ernest
Jawahar, C. V.
Karatzas, Dimosthenis
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4290 - 4300
[42] Multitask Learning for Visual Question Answering
Ma, Jie
Liu, Jun
Lin, Qika
Wu, Bei
Wang, Yaxian
You, Yang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (03) : 1380 - 1394
[43] Visual Question Answering for Intelligent Interaction
Gao, Panpan
Sun, Hanxu
Chen, Gang
Wang, Ruiquan
Li, Minggang
MOBILE INFORMATION SYSTEMS, 2022, 2022
[44] Differential Networks for Visual Question Answering
Wu, Chenfei
Liu, Jinlai
Wang, Xiaojie
Li, Ruifan
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8997 - 9004
[45] Document Collection Visual Question Answering
Tito, Ruben
Karatzas, Dimosthenis
Valveny, Ernest
DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II, 2021, 12822 : 778 - 792
[46] Fusing Attention with Visual Question Answering
Burt, Ryan
Cudic, Mihael
Principe, Jose C.
2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 949 - 953
[47] LANGUAGE AND VISUAL RELATIONS ENCODING FOR VISUAL QUESTION ANSWERING
Liu, Fei
Liu, Jing
Fang, Zhiwei
Lu, Hanqing
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3307 - 3311
[48] Compositional Substitutivity of Visual Reasoning for Visual Question Answering
Li, Chuanhao
Li, Zhen
Jing, Chenchen
Wu, Yuwei
Zhai, Mingliang
Jia, Yunde
COMPUTER VISION - ECCV 2024, PT XLVIII, 2025, 15106 : 143 - 160
[49] Visual Question Answering using Explicit Visual Attention
Lioutas, Vasileios
Passalis, Nikolaos
Tefas, Anastasios
2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
[50] Exploiting hierarchical visual features for visual question answering
Hong, Jongkwang
Fu, Jianlong
Uh, Youngjung
Mei, Tao
Byun, Hyeran
NEUROCOMPUTING, 2019, 351 : 187 - 195

← 1 2 3 4 5 →