Question Type-Aware Debiasing for Test-Time Visual Question Answering Model Adaptation

被引:0
|
作者
Liu, Jin [1 ]
Xie, Jialong [1 ]
Zhou, Fengyu [1 ]
He, Shengfeng [2 ]
机构
[1] Shandong Univ, Sch Control Sci & Engn, Jinan 250061, Peoples R China
[2] Singapore Management Univ, Sch Comp & Informat Syst, Singapore 178902, Singapore
基金
新加坡国家研究基金会;
关键词
Test-time adaptation; visual question answering; language debiasing;
D O I
10.1109/TCSVT.2024.3410041
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In Visual Question Answering (VQA), addressing language prior bias, where models excessively rely on superficial correlations between questions and answers, is crucial. This issue becomes more pronounced in real-world applications with diverse domains and varied question-answer distributions during testing. To tackle this challenge, Test-time Adaptation (TTA) has emerged, allowing pre-trained VQA models to adapt using unlabeled test samples. Current state-of-the-art models select reliable test samples based on fixed entropy thresholds and employ self-supervised debiasing techniques. However, these methods struggle with diverse answer spaces linked to different question types and may fail to identify biased samples that still leverage relevant visual context. In this paper, we propose Question type-guided Entropy Minimization and Debiasing (QED) as a solution for test-time VQA model adaptation. Our approach involves adaptive entropy minimization based on question types to improve the identification of fine-grained and unreliable samples. Additionally, we generate negative samples for each test sample and label them as biased if their answer entropy change rate significantly differs from positive test samples, subsequently removing them. We evaluate our approach on two public benchmarks, VQA-CP v2, and VQA-CP v1, and achieve new state-of-the-art results, with overall accuracy rates of 48.13% and 46.18%, respectively.
引用
收藏
页码:10805 / 10816
页数:12
相关论文
共 50 条
  • [21] A Question-Centric Model for Visual Question Answering in Medical Imaging
    Vu, Minh H.
    Lofstedt, Tommy
    Nyholm, Tufve
    Sznitman, Raphael
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (09) : 2856 - 2868
  • [22] Vector Semiotic Model for Visual Question Answering
    Kovalev, Alexey K.
    Shaban, Makhmud
    Osipov, Evgeny
    Panov, Aleksandr, I
    COGNITIVE SYSTEMS RESEARCH, 2022, 71 : 52 - 63
  • [23] ConceptBert: Concept-Aware Representation for Visual Question Answering
    Garderes, Francois
    Ziaeefard, Maryam
    Abeloos, Baptiste
    Lecue, Freddy
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 489 - 498
  • [24] Visual question answering with gated relation-aware auxiliary
    Shao, Xiangjun
    Xiang, Zhenglong
    Li, Yuanxiang
    IET IMAGE PROCESSING, 2022, 16 (05) : 1424 - 1432
  • [25] Type-Aware Question Answering over Knowledge Base with Attention-Based Tree-Structured Neural Networks
    Jun Yin
    Wayne Xin Zhao
    Xiao-Ming Li
    Journal of Computer Science and Technology, 2017, 32 : 805 - 813
  • [26] Answer-Type Prediction for Visual Question Answering
    Kafle, Kushal
    Kanan, Christopher
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4976 - 4984
  • [27] Visual question answering model based on visual relationship detection
    Xi, Yuling
    Zhang, Yanning
    Ding, Songtao
    Wan, Shaohua
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 80
  • [28] CAAN: Context-Aware attention network for visual question answering
    Chen, Chongqing
    Han, Dezhi
    Chang, Chin-Chen
    Pattern Recognition, 2022, 132
  • [29] length Context-aware Multi-level Question Embedding Fusion for visual question answering
    Li, Shengdong
    Gong, Chen
    Zhu, Yuqing
    Luo, Chuanwen
    Hong, Yi
    Lv, Xueqiang
    INFORMATION FUSION, 2024, 102
  • [30] Compressing and Debiasing Vision-Language Pre-Trained Models for Visual Question Answering
    Si, Qingyi
    Liu, Yuanxin
    Lin, Zheng
    Fu, Peng
    Cao, Yanan
    Wang, Weiping
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 513 - 529