Question Type-Aware Debiasing for Test-Time Visual Question Answering Model Adaptation

被引:0
|
作者
Liu, Jin [1 ]
Xie, Jialong [1 ]
Zhou, Fengyu [1 ]
He, Shengfeng [2 ]
机构
[1] Shandong Univ, Sch Control Sci & Engn, Jinan 250061, Peoples R China
[2] Singapore Management Univ, Sch Comp & Informat Syst, Singapore 178902, Singapore
基金
新加坡国家研究基金会;
关键词
Test-time adaptation; visual question answering; language debiasing;
D O I
10.1109/TCSVT.2024.3410041
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In Visual Question Answering (VQA), addressing language prior bias, where models excessively rely on superficial correlations between questions and answers, is crucial. This issue becomes more pronounced in real-world applications with diverse domains and varied question-answer distributions during testing. To tackle this challenge, Test-time Adaptation (TTA) has emerged, allowing pre-trained VQA models to adapt using unlabeled test samples. Current state-of-the-art models select reliable test samples based on fixed entropy thresholds and employ self-supervised debiasing techniques. However, these methods struggle with diverse answer spaces linked to different question types and may fail to identify biased samples that still leverage relevant visual context. In this paper, we propose Question type-guided Entropy Minimization and Debiasing (QED) as a solution for test-time VQA model adaptation. Our approach involves adaptive entropy minimization based on question types to improve the identification of fine-grained and unreliable samples. Additionally, we generate negative samples for each test sample and label them as biased if their answer entropy change rate significantly differs from positive test samples, subsequently removing them. We evaluate our approach on two public benchmarks, VQA-CP v2, and VQA-CP v1, and achieve new state-of-the-art results, with overall accuracy rates of 48.13% and 46.18%, respectively.
引用
收藏
页码:10805 / 10816
页数:12
相关论文
共 50 条
  • [1] TYPE-AWARE MEDICAL VISUAL QUESTION ANSWERING
    Zhang, Anda
    Tao, Wei
    Li, Ziyan
    Wang, Haofen
    Zhang, Wenqiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4838 - 4842
  • [2] Test-Time Model Adaptation for Visual Question Answering With Debiased Self-Supervisions
    Wen, Zhiquan
    Niu, Shuaicheng
    Li, Ge
    Wu, Qingyao
    Tan, Mingkui
    Wu, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2137 - 2147
  • [3] Beware of Model Collapse! Fast and Stable Test-time Adaptation for Robust Question Answering
    Su, Yi
    Ji, Yixin
    Li, Juntao
    Ye, Hai
    Zhang, Min
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12998 - 13011
  • [4] Multi-Source Test-Time Adaptation as Dueling Bandits for Extractive Question Answering
    Ye, Hai
    Xie, Qizhe
    Ng, Hwee Tou
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 9647 - 9660
  • [5] Efficient Counterfactual Debiasing for Visual Question Answering
    Kolling, Camila
    More, Martin
    Gavenski, Nathan
    Pooch, Eduardo
    Parraga, Otavio
    Barros, Rodrigo C.
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2572 - 2581
  • [6] Debiasing Medical Visual Question Answering via Counterfactual Training
    Zhan, Chenlu
    Peng, Peng
    Zhang, Hanrong
    Sun, Haiyue
    Shang, Chunnan
    Chen, Tao
    Wang, Hongsen
    Wang, Gaoang
    Wang, Hongwei
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT II, 2023, 14221 : 382 - 393
  • [7] Test-Time Self-Adaptive Small Language Models for Question Answering
    Jeon, Soyeong
    Baek, Jinheon
    Choi, Sukmin
    Hwang, Sung Ju
    Park, Jong C.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15459 - 15469
  • [8] Question Type Guided Attention in Visual Question Answering
    Shi, Yang
    Furlanello, Tommaso
    Zha, Sheng
    Anandkumar, Animashree
    COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175
  • [9] A Multi-modal Debiasing Model with Dynamical Constraint for Robust Visual Question Answering
    Li, Yu
    Hu, Bojie
    Zhang, Fengshuo
    Yu, Yahan
    Liu, Jian
    Chen, Yufeng
    Xu, Jinan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5032 - 5045
  • [10] CHANGE-AWARE VISUAL QUESTION ANSWERING
    Yuan, Zhenghang
    Mou, Lichao
    Zhu, Xiao Xiang
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 227 - 230