Question Type-Aware Debiasing for Test-Time Visual Question Answering Model Adaptation

被引:0
|
作者
Liu, Jin [1 ]
Xie, Jialong [1 ]
Zhou, Fengyu [1 ]
He, Shengfeng [2 ]
机构
[1] Shandong Univ, Sch Control Sci & Engn, Jinan 250061, Peoples R China
[2] Singapore Management Univ, Sch Comp & Informat Syst, Singapore 178902, Singapore
基金
新加坡国家研究基金会;
关键词
Test-time adaptation; visual question answering; language debiasing;
D O I
10.1109/TCSVT.2024.3410041
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In Visual Question Answering (VQA), addressing language prior bias, where models excessively rely on superficial correlations between questions and answers, is crucial. This issue becomes more pronounced in real-world applications with diverse domains and varied question-answer distributions during testing. To tackle this challenge, Test-time Adaptation (TTA) has emerged, allowing pre-trained VQA models to adapt using unlabeled test samples. Current state-of-the-art models select reliable test samples based on fixed entropy thresholds and employ self-supervised debiasing techniques. However, these methods struggle with diverse answer spaces linked to different question types and may fail to identify biased samples that still leverage relevant visual context. In this paper, we propose Question type-guided Entropy Minimization and Debiasing (QED) as a solution for test-time VQA model adaptation. Our approach involves adaptive entropy minimization based on question types to improve the identification of fine-grained and unreliable samples. Additionally, we generate negative samples for each test sample and label them as biased if their answer entropy change rate significantly differs from positive test samples, subsequently removing them. We evaluate our approach on two public benchmarks, VQA-CP v2, and VQA-CP v1, and achieve new state-of-the-art results, with overall accuracy rates of 48.13% and 46.18%, respectively.
引用
收藏
页码:10805 / 10816
页数:12
相关论文
共 50 条
  • [11] Mood-aware visual question answering
    Ruwa, Nelson
    Mao, Qirong
    Wang, Liangjun
    Gou, Jianping
    Dong, Ming
    NEUROCOMPUTING, 2019, 330 : 305 - 316
  • [12] Question-aware prediction with candidate answer recommendation for visual question answering
    Kim, B.
    Kim, J.
    ELECTRONICS LETTERS, 2017, 53 (18) : 1244 - 1245
  • [13] KVQA: Knowledge-Aware Visual Question Answering
    Shah, Sanket
    Mishra, Anand
    Yadati, Naganand
    Talukdar, Partha Pratim
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8876 - 8884
  • [14] Privacy-Aware Document Visual Question Answering
    Tito, Ruben
    Nguyen, Khanh
    Tobaben, Marlon
    Kerkouche, Raouf
    Souibgui, Mohamed Ali
    Jung, Kangsoo
    Jalko, Joonas
    DAndecy, Vincent Poulain
    Joseph, Aurelie
    Kang, Lei
    Valveny, Ernest
    Honkela, Antti
    Fritz, Mario
    Karatzas, Dimosthenis
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT VI, 2024, 14809 : 199 - 218
  • [15] GRACE: Graph-Based Contextual Debiasing for Fair Visual Question Answering
    Zhang, Yifeng
    Jiang, Ming
    Zhao, Qi
    COMPUTER VISION - ECCV 2024, PT XVII, 2025, 15075 : 176 - 194
  • [16] Cross-Dataset Adaptation for Visual Question Answering
    Chao, Wei-Lun
    Hu, Hexiang
    Sha, Fei
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5716 - 5725
  • [17] Co-Attention Network With Question Type for Visual Question Answering
    Yang, Chao
    Jiang, Mengqi
    Jiang, Bin
    Zhou, Weixin
    Li, Keqin
    IEEE ACCESS, 2019, 7 : 40771 - 40781
  • [18] VQA as a factoid question answering problem: A novel approach for knowledge-aware and explainable visual question answering
    Narayanan, Abhishek
    Rao, Abijna
    Prasad, Abhishek
    Natarajan, S.
    IMAGE AND VISION COMPUTING, 2021, 116
  • [19] Type-Aware Question Answering over Knowledge Base with Attention-Based Tree-Structured Neural Networks
    Yin, Jun
    Zhao, Wayne Xin
    Li, Xiao-Ming
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (04) : 805 - 813
  • [20] On the role of question encoder sequence model in robust visual question answering
    Kv, Gouthaman
    Mittal, Anurag
    PATTERN RECOGNITION, 2022, 131