Collaborative Modality Fusion for Mitigating Language Bias in Visual Question Answering

被引:2
|
作者
Lu, Qiwen [1 ]
Chen, Shengbo [1 ]
Zhu, Xiaoke [1 ]
机构
[1] Henan Univ, Sch Comp & Informat Engn, Kaifeng 475001, Peoples R China
关键词
visual question answering; collaborative learning; language bias;
D O I
10.3390/jimaging10030056
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
Language bias stands as a noteworthy concern in visual question answering (VQA), wherein models tend to rely on spurious correlations between questions and answers for prediction. This prevents the models from effectively generalizing, leading to a decrease in performance. In order to address this bias, we propose a novel modality fusion collaborative de-biasing algorithm (CoD). In our approach, bias is considered as the model's neglect of information from a particular modality during prediction. We employ a collaborative training approach to facilitate mutual modeling between different modalities, achieving efficient feature fusion and enabling the model to fully leverage multimodal knowledge for prediction. Our experiments on various datasets, including VQA-CP v2, VQA v2, and VQA-VS, using different validation strategies, demonstrate the effectiveness of our approach. Notably, employing a basic baseline model resulted in an accuracy of 60.14% on VQA-CP v2.
引用
收藏
页数:15
相关论文
共 50 条
  • [11] Dynamic Fusion with Intra- and Inter-modality Attention Flow for Visual Question Answering
    Gao, Peng
    Jiang, Zhengkai
    You, Haoxuan
    Lu, Pan
    Hoi, Steven
    Wang, Xiaogang
    Li, Hongsheng
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6632 - 6641
  • [12] LANGUAGE AND VISUAL RELATIONS ENCODING FOR VISUAL QUESTION ANSWERING
    Liu, Fei
    Liu, Jing
    Fang, Zhiwei
    Lu, Hanqing
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3307 - 3311
  • [13] OVERCOMING LANGUAGE BIAS IN REMOTE SENSING VISUAL QUESTION ANSWERING VIA ADVERSARIAL TRAINING
    Yuan, Zhenghang
    Mou, Lichao
    Zhu, Xiao Xiang
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 2235 - 2238
  • [14] Feature Fusion Attention Visual Question Answering
    Wang, Chunlin
    Sun, Jianyong
    Chen, Xiaolin
    ICMLC 2019: 2019 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2019, : 412 - 416
  • [15] Information fusion in visual question answering: A Survey
    Zhang, Dongxiang
    Cao, Rui
    Wu, Sai
    INFORMATION FUSION, 2019, 52 : 268 - 280
  • [16] Language Bias-Driven Self-Knowledge Distillation with Generalization Uncertainty for Reducing Language Bias in Visual Question Answering
    Yuan, Desen
    Wang, Lei
    Wu, Qingbo
    Meng, Fanman
    Ngan, King Ngi
    Xu, Linfeng
    APPLIED SCIENCES-BASEL, 2022, 12 (15):
  • [17] Dataset bias: A case study for visual question answering
    Das A.
    Anjum S.
    Gurari D.
    Proceedings of the Association for Information Science and Technology, 2019, 56 (01): : 58 - 67
  • [18] Explicit Bias Discovery in Visual Question Answering Models
    Manjunatha, Varun
    Saini, Nirat
    Davis, Larry S.
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9554 - 9563
  • [19] Collaborative Attention Network to Enhance Visual Question Answering
    Gu, Rui
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 304 - 305
  • [20] Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
    Duy-Kien Nguyen
    Okatani, Takayuki
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6087 - 6096