Greedy Gradient Ensemble for Robust Visual Question Answering

被引：30

作者：

Han, Xinzhe ^{[1
,2
]}

Wang, Shuhui ^{[1
]}

Su, Chi ^{[3
]}

Huang, Qingming ^{[1
,2
,4
]}

Tian, Qi ^{[5
]}

机构：

[1] Chinese Acad Sci, Inst Comput Tech, Key Lab Intell Info Proc, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

[3] Kingsoft Cloud, Beijing, Peoples R China

[4] Peng Cheng Lab, Shenzhen, Peoples R China

[5] Huawei Technol, Cloud BU, Shenzhen, Peoples R China

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

D O I：

10.1109/ICCV48922.2021.00161

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Language bias is a critical issue in Visual Question Answering (VQA), where models often exploit dataset biases for the final decision without considering the image information. As a result, they suffer from performance drop on out-of-distribution data and inadequate visual explanation. Based on experimental analysis for existing robust VQA methods, we stress the language bias in VQA that comes from two aspects, i.e., distribution bias and shortcut bias. We further propose a new de-bias framework, Greedy Gradient Ensemble (GGE), which combines multiple biased models for unbiased base model learning. With the greedy strategy, GGE forces the biased models to over-fit the biased data distribution in priority, thus makes the base model pay more attention to examples that are hard to solve by biased models. The experiments demonstrate that our method makes better use of visual information and achieves state-of-the-art performance on diagnosing dataset VQACP without using extra annotations.

引用

页码：1564 / 1573

页数：10

共 50 条

[1] Robust Explanations for Visual Question Answering
Patro, Badri N.
Patel, Shivansh
Namboodiri, Vinay P.
2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1566 - 1575
[2] Generative Bias for Robust Visual Question Answering
Cho, Jae Won
Kim, Dong-Jin
Ryu, Hyeonggon
Kweon, In So
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11681 - 11690
[3] Cycle-Consistency for Robust Visual Question Answering
Shah, Meet
Chen, Xinlei
Rohrbach, Marcus
Parikh, Devi
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6642 - 6651
[4] Rethinking Data Augmentation for Robust Visual Question Answering
Chen, Long
Zheng, Yuhang
Xiao, Jun
COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 95 - 112
[5] On the role of question encoder sequence model in robust visual question answering
Kv, Gouthaman
Mittal, Anurag
PATTERN RECOGNITION, 2022, 131
[6] Fair Attention Network for Robust Visual Question Answering
Bi Y.
Jiang H.
Hu Y.
Sun Y.
Yin B.
IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (09) : 1 - 1
[7] Explicit ensemble attention learning for improving visual question answering
Lioutas, Vasileios
Passalis, Nikolaos
Tefas, Anastasios
PATTERN RECOGNITION LETTERS, 2018, 111 : 51 - 57
[8] Self-Critical Reasoning for Robust Visual Question Answering
Wu, Jialin
Mooney, Raymond J.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[9] R-VQA: A robust visual question answering model
Chowdhury, Souvik
Soni, Badal
Knowledge-Based Systems, 2025, 309
[10] Robust Visual Question Answering: Datasets, Methods, and Future Challenges
Ma, Jie
Wang, Pinghui
Kong, Dechen
Wang, Zewei
Liu, Jun
Pei, Hongbin
Zhao, Junzhou
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5575 - 5594

← 1 2 3 4 5 →