Question Type-Aware Debiasing for Test-Time Visual Question Answering Model Adaptation

被引:0
|
作者
Liu, Jin [1 ]
Xie, Jialong [1 ]
Zhou, Fengyu [1 ]
He, Shengfeng [2 ]
机构
[1] Shandong Univ, Sch Control Sci & Engn, Jinan 250061, Peoples R China
[2] Singapore Management Univ, Sch Comp & Informat Syst, Singapore 178902, Singapore
基金
新加坡国家研究基金会;
关键词
Test-time adaptation; visual question answering; language debiasing;
D O I
10.1109/TCSVT.2024.3410041
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In Visual Question Answering (VQA), addressing language prior bias, where models excessively rely on superficial correlations between questions and answers, is crucial. This issue becomes more pronounced in real-world applications with diverse domains and varied question-answer distributions during testing. To tackle this challenge, Test-time Adaptation (TTA) has emerged, allowing pre-trained VQA models to adapt using unlabeled test samples. Current state-of-the-art models select reliable test samples based on fixed entropy thresholds and employ self-supervised debiasing techniques. However, these methods struggle with diverse answer spaces linked to different question types and may fail to identify biased samples that still leverage relevant visual context. In this paper, we propose Question type-guided Entropy Minimization and Debiasing (QED) as a solution for test-time VQA model adaptation. Our approach involves adaptive entropy minimization based on question types to improve the identification of fine-grained and unreliable samples. Additionally, we generate negative samples for each test sample and label them as biased if their answer entropy change rate significantly differs from positive test samples, subsequently removing them. We evaluate our approach on two public benchmarks, VQA-CP v2, and VQA-CP v1, and achieve new state-of-the-art results, with overall accuracy rates of 48.13% and 46.18%, respectively.
引用
收藏
页码:10805 / 10816
页数:12
相关论文
共 50 条
  • [31] Language-aware Visual Semantic Distillation for Video Question Answering
    Zou, Bo
    Yang, Chao
    Qiao, Yu
    Quan, Chengbin
    Zhao, Youjian
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27103 - 27113
  • [32] Boosting Visual Question Answering with Context-aware Knowledge Aggregation
    Li, Guohao
    Wang, Xin
    Zhu, Wenwu
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1227 - 1235
  • [33] Semantic-Aware Modular Capsule Routing for Visual Question Answering
    Han, Yudong
    Yin, Jianhua
    Wu, Jianlong
    Wei, Yinwei
    Nie, Liqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5537 - 5549
  • [34] Relation-Aware Image Captioning for Explainable Visual Question Answering
    Tseng, Ching-Shan
    Lin, Ying-Jia
    Kao, Hung-Yu
    2022 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, TAAI, 2022, : 149 - 154
  • [35] CAAN: Context-Aware attention network for visual question answering
    Chen, Chongqing
    Han, Dezhi
    Chang, Chin -Chen
    PATTERN RECOGNITION, 2022, 132
  • [36] Relation-Aware Graph Attention Network for Visual Question Answering
    Li, Linjie
    Gan, Zhe
    Cheng, Yu
    Liu, Jingjing
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10312 - 10321
  • [37] Question-aware dynamic scene graph of local semantic representation learning for visual question answering
    Wu, Jinmeng
    Ge, Fulin
    Hong, Hanyu
    Shi, Yu
    Hao, Yanbin
    Ma, Lei
    PATTERN RECOGNITION LETTERS, 2023, 170 : 93 - 99
  • [38] A visual question answering model based on image captioning
    Zhou, Kun
    Liu, Qiongjie
    Zhao, Dexin
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [39] Visual Question Answering With a Hybrid Convolution Recurrent Model
    Harzig, Philipp
    Eggert, Christian
    Lienhart, Rainer
    ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 318 - 325
  • [40] CONTEXT RELATION FUSION MODEL FOR VISUAL QUESTION ANSWERING
    Zhang, Haotian
    Wu, Wei
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2112 - 2116