Collaborative Modality Fusion for Mitigating Language Bias in Visual Question Answering

被引:2
|
作者
Lu, Qiwen [1 ]
Chen, Shengbo [1 ]
Zhu, Xiaoke [1 ]
机构
[1] Henan Univ, Sch Comp & Informat Engn, Kaifeng 475001, Peoples R China
关键词
visual question answering; collaborative learning; language bias;
D O I
10.3390/jimaging10030056
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
Language bias stands as a noteworthy concern in visual question answering (VQA), wherein models tend to rely on spurious correlations between questions and answers for prediction. This prevents the models from effectively generalizing, leading to a decrease in performance. In order to address this bias, we propose a novel modality fusion collaborative de-biasing algorithm (CoD). In our approach, bias is considered as the model's neglect of information from a particular modality during prediction. We employ a collaborative training approach to facilitate mutual modeling between different modalities, achieving efficient feature fusion and enabling the model to fully leverage multimodal knowledge for prediction. Our experiments on various datasets, including VQA-CP v2, VQA v2, and VQA-VS, using different validation strategies, demonstrate the effectiveness of our approach. Notably, employing a basic baseline model resulted in an accuracy of 60.14% on VQA-CP v2.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Multiview Language Bias Reduction for Visual Question Answering
    Li, Pengju
    Tan, Zhiyi
    Bao, Bing-Kun
    IEEE MULTIMEDIA, 2023, 30 (01) : 91 - 99
  • [2] A Causal Approach to Mitigate Modality Preference Bias in Medical Visual Question Answering
    Ye, Shuchang
    Naseem, Usman
    Meng, Mingyuan
    Feng, Dagan
    Kim, Jinman
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON VISION-LANGUAGE MODELS FOR BIOMEDICAL APPLICATIONS, VLM4BIO 2024, 2024, : 13 - 17
  • [3] SkillCLIP: Skill Aware Modality Fusion Visual Question Answering (Student Abstract)
    Naik, Atharva
    Butala, Yash Parag
    Vaikunthan, Navaneethan
    Kapoor, Raghav
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23592 - 23593
  • [4] Multi-Modality Global Fusion Attention Network for Visual Question Answering
    Yang, Cheng
    Wu, Weijia
    Wang, Yuxing
    Zhou, Hong
    ELECTRONICS, 2020, 9 (11) : 1 - 12
  • [5] Towards bias-aware visual question answering: Rectifying and mitigating comprehension biases
    Chen, Chongqing
    Han, Dezhi
    Guo, Zihan
    Chang, Chin-Chen
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 264
  • [6] Cross Modality Bias in Visual Question Answering: A Causal View With Possible Worlds VQA
    Vosoughi, Ali
    Deng, Shijian
    Zhang, Songyang
    Tian, Yapeng
    Xu, Chenliang
    Luo, Jiebo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8609 - 8624
  • [7] Overcoming Language Priors via Shuffling Language Bias for Robust Visual Question Answering
    Zhao, J.
    Yu, Z.
    Zhang, X.
    Yang, Y.
    IEEE ACCESS, 2023, 11 : 85980 - 85989
  • [8] Generative Bias for Robust Visual Question Answering
    Cho, Jae Won
    Kim, Dong-Jin
    Ryu, Hyeonggon
    Kweon, In So
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11681 - 11690
  • [9] SIMPLE AND EFFECTIVE VISUAL QUESTION ANSWERING IN A SINGLE MODALITY
    Lin, Yuetan
    Pang, Zhangyang
    Li, Yanan
    Wang, Donghui
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 2276 - 2280
  • [10] Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
    Cao, Jianjian
    Qin, Xiameng
    Zhao, Sanyuan
    Shen, Jianbing
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022,