Exploring and exploiting model uncertainty for robust visual question answering

被引:0
|
作者
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China [1 ]
不详 [2 ]
不详 [3 ]
不详 [4 ]
机构
[1] Hefei University of Technology,School of Computer Science and Information Engineering
[2] Hefei Comprehensive National Science Center,Institute of Dataspace
[3] Fuyang Normal University,School of Computer and Information Engineering
[4] University of Science and Technology of China,School of Information Science and Technology
来源
Multimedia Syst | / 6卷 / 6期
关键词
Digital elevation model;
D O I
10.1007/s00530-024-01560-0
中图分类号
学科分类号
摘要
Visual Question Answering (VQA) methods have been widely demonstrated to exhibit bias in answering questions due to the distribution differences of answer samples between training and testing, resulting in resultant performance degradation. While numerous efforts have demonstrated promising results in overcoming language bias, broader implications (e.g., the trustworthiness of current VQA model predictions) of the problem remain unexplored. In this paper, we aim to provide a different viewpoint on the problem from the perspective of model uncertainty. In a series of empirical studies on the VQA-CP v2 dataset, we find that current VQA models are often biased towards making obviously incorrect answers with high confidence, i.e., being overconfident, which indicates high uncertainty. In light of this observation, we: (1) design a novel metric for monitoring model overconfidence, and (2) propose a model calibration method to address the overconfidence issue, thereby making the model more reliable and better at generalization. The calibration method explicitly imposes constraints on model predictions to make the model less confident during training. It has the advantage of being model-agnostic and computationally efficient. Experiments demonstrate that VQA approaches exhibiting overconfidence are usually negatively impacted in terms of generalization, and fortunately their performance and trustworthiness can be boosted by the adoption of our calibration method. Code is available at https://github.com/HCI-LMC/VQA-Uncertainty © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.
引用
收藏
相关论文
共 50 条
  • [31] VQA: Visual Question Answering
    Antol, Stanislaw
    Agrawal, Aishwarya
    Lu, Jiasen
    Mitchell, Margaret
    Batra, Dhruv
    Zitnick, C. Lawrence
    Parikh, Devi
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
  • [32] Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
    Xu, Huijuan
    Saenko, Kate
    COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 : 451 - 466
  • [33] VQA: Visual Question Answering
    Agrawal, Aishwarya
    Lu, Jiasen
    Antol, Stanislaw
    Mitchell, Margaret
    Zitnick, C. Lawrence
    Parikh, Devi
    Batra, Dhruv
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) : 4 - 31
  • [34] Survey on Visual Question Answering
    Bao X.-G.
    Zhou C.-L.
    Xiao K.-J.
    Qin B.
    Ruan Jian Xue Bao/Journal of Software, 2021, 32 (08): : 2522 - 2544
  • [35] Visual Question Answering A tutorial
    Teney, Damien
    Wu, Qi
    van den Hengel, Anton
    IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 63 - 75
  • [36] Visual Question Generation as Dual Task of Visual Question Answering
    Li, Yikang
    Duan, Nan
    Zhou, Bolei
    Chu, Xiao
    Ouyang, Wanli
    Wang, Xiaogang
    Zhou, Ming
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6116 - 6124
  • [37] Exploiting Sentence Embedding for Medical Question Answering
    Hao, Yu
    Liu, Xien
    Wu, Ji
    Lv, Ping
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 938 - 945
  • [38] Exploiting Opinion Influence in Question Answering Systems
    Cercel, Dumitru-Clementin
    Onose, Cristian
    Trausan-Matu, Stefan
    Pop, Florin
    2017 IEEE 29TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2017), 2017, : 197 - 201
  • [39] Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning
    Zheng, Yuhang
    Wang, Zhen
    Chen, Long
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1084 - 1088
  • [40] A Symbolic-Neural Reasoning Model for Visual Question Answering
    Gao, Jingying
    Blair, Alan
    Pagnucco, Maurice
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,