Greedy Gradient Ensemble for Robust Visual Question Answering

被引:30
|
作者
Han, Xinzhe [1 ,2 ]
Wang, Shuhui [1 ]
Su, Chi [3 ]
Huang, Qingming [1 ,2 ,4 ]
Tian, Qi [5 ]
机构
[1] Chinese Acad Sci, Inst Comput Tech, Key Lab Intell Info Proc, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Kingsoft Cloud, Beijing, Peoples R China
[4] Peng Cheng Lab, Shenzhen, Peoples R China
[5] Huawei Technol, Cloud BU, Shenzhen, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
D O I
10.1109/ICCV48922.2021.00161
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Language bias is a critical issue in Visual Question Answering (VQA), where models often exploit dataset biases for the final decision without considering the image information. As a result, they suffer from performance drop on out-of-distribution data and inadequate visual explanation. Based on experimental analysis for existing robust VQA methods, we stress the language bias in VQA that comes from two aspects, i.e., distribution bias and shortcut bias. We further propose a new de-bias framework, Greedy Gradient Ensemble (GGE), which combines multiple biased models for unbiased base model learning. With the greedy strategy, GGE forces the biased models to over-fit the biased data distribution in priority, thus makes the base model pay more attention to examples that are hard to solve by biased models. The experiments demonstrate that our method makes better use of visual information and achieves state-of-the-art performance on diagnosing dataset VQACP without using extra annotations.
引用
收藏
页码:1564 / 1573
页数:10
相关论文
共 50 条
  • [1] Robust Explanations for Visual Question Answering
    Patro, Badri N.
    Patel, Shivansh
    Namboodiri, Vinay P.
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1566 - 1575
  • [2] Generative Bias for Robust Visual Question Answering
    Cho, Jae Won
    Kim, Dong-Jin
    Ryu, Hyeonggon
    Kweon, In So
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11681 - 11690
  • [3] Cycle-Consistency for Robust Visual Question Answering
    Shah, Meet
    Chen, Xinlei
    Rohrbach, Marcus
    Parikh, Devi
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6642 - 6651
  • [4] Rethinking Data Augmentation for Robust Visual Question Answering
    Chen, Long
    Zheng, Yuhang
    Xiao, Jun
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 95 - 112
  • [5] On the role of question encoder sequence model in robust visual question answering
    Kv, Gouthaman
    Mittal, Anurag
    PATTERN RECOGNITION, 2022, 131
  • [6] Fair Attention Network for Robust Visual Question Answering
    Bi Y.
    Jiang H.
    Hu Y.
    Sun Y.
    Yin B.
    IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (09) : 1 - 1
  • [7] Explicit ensemble attention learning for improving visual question answering
    Lioutas, Vasileios
    Passalis, Nikolaos
    Tefas, Anastasios
    PATTERN RECOGNITION LETTERS, 2018, 111 : 51 - 57
  • [8] Self-Critical Reasoning for Robust Visual Question Answering
    Wu, Jialin
    Mooney, Raymond J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [9] R-VQA: A robust visual question answering model
    Chowdhury, Souvik
    Soni, Badal
    Knowledge-Based Systems, 2025, 309
  • [10] Robust Visual Question Answering: Datasets, Methods, and Future Challenges
    Ma, Jie
    Wang, Pinghui
    Kong, Dechen
    Wang, Zewei
    Liu, Jun
    Pei, Hongbin
    Zhao, Junzhou
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5575 - 5594