Exploring and exploiting model uncertainty for robust visual question answering

被引:0
|
作者
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China [1 ]
不详 [2 ]
不详 [3 ]
不详 [4 ]
机构
[1] Hefei University of Technology,School of Computer Science and Information Engineering
[2] Hefei Comprehensive National Science Center,Institute of Dataspace
[3] Fuyang Normal University,School of Computer and Information Engineering
[4] University of Science and Technology of China,School of Information Science and Technology
来源
Multimedia Syst | / 6卷 / 6期
关键词
Digital elevation model;
D O I
10.1007/s00530-024-01560-0
中图分类号
学科分类号
摘要
Visual Question Answering (VQA) methods have been widely demonstrated to exhibit bias in answering questions due to the distribution differences of answer samples between training and testing, resulting in resultant performance degradation. While numerous efforts have demonstrated promising results in overcoming language bias, broader implications (e.g., the trustworthiness of current VQA model predictions) of the problem remain unexplored. In this paper, we aim to provide a different viewpoint on the problem from the perspective of model uncertainty. In a series of empirical studies on the VQA-CP v2 dataset, we find that current VQA models are often biased towards making obviously incorrect answers with high confidence, i.e., being overconfident, which indicates high uncertainty. In light of this observation, we: (1) design a novel metric for monitoring model overconfidence, and (2) propose a model calibration method to address the overconfidence issue, thereby making the model more reliable and better at generalization. The calibration method explicitly imposes constraints on model predictions to make the model less confident during training. It has the advantage of being model-agnostic and computationally efficient. Experiments demonstrate that VQA approaches exhibiting overconfidence are usually negatively impacted in terms of generalization, and fortunately their performance and trustworthiness can be boosted by the adoption of our calibration method. Code is available at https://github.com/HCI-LMC/VQA-Uncertainty © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.
引用
收藏
相关论文
共 50 条
  • [1] On the role of question encoder sequence model in robust visual question answering
    Kv, Gouthaman
    Mittal, Anurag
    PATTERN RECOGNITION, 2022, 131
  • [2] Exploiting hierarchical visual features for visual question answering
    Hong, Jongkwang
    Fu, Jianlong
    Uh, Youngjung
    Mei, Tao
    Byun, Hyeran
    NEUROCOMPUTING, 2019, 351 : 187 - 195
  • [3] Robust Explanations for Visual Question Answering
    Patro, Badri N.
    Patel, Shivansh
    Namboodiri, Vinay P.
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1566 - 1575
  • [4] R-VQA: A robust visual question answering model
    Chowdhury, Souvik
    Soni, Badal
    Knowledge-Based Systems, 2025, 309
  • [5] Reducing Multi-model Biases for Robust Visual Question Answering
    Zhang F.
    Li Y.
    Li X.
    Xu J.
    Chen Y.
    Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60 (01): : 23 - 33
  • [6] Generative Bias for Robust Visual Question Answering
    Cho, Jae Won
    Kim, Dong-Jin
    Ryu, Hyeonggon
    Kweon, In So
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11681 - 11690
  • [7] SimVQA: Exploring Simulated Environments for Visual Question Answering
    Cascante-Bonilla, Paola
    Wu, Hui
    Wang, Letao
    Feris, Rogerio
    Ordonez, Vicente
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5046 - 5056
  • [8] Cycle-Consistency for Robust Visual Question Answering
    Shah, Meet
    Chen, Xinlei
    Rohrbach, Marcus
    Parikh, Devi
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6642 - 6651
  • [9] Rethinking Data Augmentation for Robust Visual Question Answering
    Chen, Long
    Zheng, Yuhang
    Xiao, Jun
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 95 - 112
  • [10] Greedy Gradient Ensemble for Robust Visual Question Answering
    Han, Xinzhe
    Wang, Shuhui
    Su, Chi
    Huang, Qingming
    Tian, Qi
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1564 - 1573