Exploring and exploiting model uncertainty for robust visual question answering

被引:0
|
作者
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China [1 ]
不详 [2 ]
不详 [3 ]
不详 [4 ]
机构
[1] Hefei University of Technology,School of Computer Science and Information Engineering
[2] Hefei Comprehensive National Science Center,Institute of Dataspace
[3] Fuyang Normal University,School of Computer and Information Engineering
[4] University of Science and Technology of China,School of Information Science and Technology
来源
Multimedia Syst | / 6卷 / 6期
关键词
Digital elevation model;
D O I
10.1007/s00530-024-01560-0
中图分类号
学科分类号
摘要
Visual Question Answering (VQA) methods have been widely demonstrated to exhibit bias in answering questions due to the distribution differences of answer samples between training and testing, resulting in resultant performance degradation. While numerous efforts have demonstrated promising results in overcoming language bias, broader implications (e.g., the trustworthiness of current VQA model predictions) of the problem remain unexplored. In this paper, we aim to provide a different viewpoint on the problem from the perspective of model uncertainty. In a series of empirical studies on the VQA-CP v2 dataset, we find that current VQA models are often biased towards making obviously incorrect answers with high confidence, i.e., being overconfident, which indicates high uncertainty. In light of this observation, we: (1) design a novel metric for monitoring model overconfidence, and (2) propose a model calibration method to address the overconfidence issue, thereby making the model more reliable and better at generalization. The calibration method explicitly imposes constraints on model predictions to make the model less confident during training. It has the advantage of being model-agnostic and computationally efficient. Experiments demonstrate that VQA approaches exhibiting overconfidence are usually negatively impacted in terms of generalization, and fortunately their performance and trustworthiness can be boosted by the adoption of our calibration method. Code is available at https://github.com/HCI-LMC/VQA-Uncertainty © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.
引用
收藏
相关论文
共 50 条
  • [41] Flexible Sentence Analysis Model for Visual Question Answering Network
    Deng, Wei
    Wang, Jianming
    Wang, Shengbei
    Jin, Guanghao
    2018 2ND INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND BIOINFORMATICS (ICBEB 2018), 2018, : 89 - 95
  • [42] A Transformer-based Medical Visual Question Answering Model
    Liu, Lei
    Su, Xiangdong
    Guo, Hui
    Zhu, Daobin
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1712 - 1718
  • [43] VQA-BC: ROBUST VISUAL QUESTION ANSWERING VIA BIDIRECTIONAL CHAINING
    Lao, Mingrui
    Guo, Yanming
    Chen, Wei
    Pu, Nan
    Lew, Michael S.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4833 - 4837
  • [44] Sequential Visual Reasoning for Visual Question Answering
    Liu, Jinlai
    Wu, Chenfei
    Wang, Xiaojie
    Dong, Xuan
    PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 410 - 415
  • [45] Introspective Distillation for Robust Question Answering
    Niu, Yulei
    Zhang, Hanwang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [46] Transformer Gate Attention Model: An Improved Attention Model for Visual Question Answering
    Zhang, Haotian
    Wu, Wei
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [47] An Improved Attention for Visual Question Answering
    Rahman, Tanzila
    Chou, Shih-Han
    Sigal, Leonid
    Carenini, Giuseppe
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662
  • [48] Visual Question Answering for Cultural Heritage
    Bongini, Pietro
    Becattini, Federico
    Bagdanov, Andrew D.
    Del Bimbo, Alberto
    INTERNATIONAL CONFERENCE FLORENCE HERI-TECH: THE FUTURE OF HERITAGE SCIENCE AND TECHNOLOGIES, 2020, 949
  • [49] Question -Led object attention for visual question answering
    Gao, Lianli
    Cao, Liangfu
    Xu, Xing
    Shao, Jie
    Song, Jingkuan
    NEUROCOMPUTING, 2020, 391 : 227 - 233
  • [50] Question action relevance and editing for visual question answering
    Andeep S. Toor
    Harry Wechsler
    Michele Nappi
    Multimedia Tools and Applications, 2019, 78 : 2921 - 2935