Towards bias-aware visual question answering: Rectifying and mitigating comprehension biases

被引:0
|
作者
Chen, Chongqing [1 ]
Han, Dezhi [1 ]
Guo, Zihan [2 ]
Chang, Chin-Chen [3 ]
机构
[1] Shanghai Maritime Univ, Sch Informat Engn, Shanghai 201306, Peoples R China
[2] Changzhi Univ, Dept Comp Sci, Changzhi 046011, Shanxi, Peoples R China
[3] Feng Chia Univ, Dept Informat Engn & Comp Sci, Taichung 407, Taiwan
基金
中国国家自然科学基金; 上海市自然科学基金;
关键词
Comprehension biases; Relational dependency modeling; Visual question answering (VQA); Inference capability; Contextual information; SELF-ATTENTION NETWORKS; LANGUAGE;
D O I
10.1016/j.eswa.2024.125817
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformers have become essential for capturing intra- and inter-dependencies in visual question answering (VQA). Yet, challenges remain in overcoming inherent comprehension biases and improving the relational dependency modeling and reasoning capabilities crucial for VQA tasks. This paper presents RMCB, a novel VQA model designed to mitigate these biases by integrating contextual information from both visual and linguistic sources and addressing potential comprehension limitations at each end. RMCB introduces enhanced relational modeling for language tokens by leveraging textual context, addressing comprehension biases arising from the isolated pairwise modeling of token relationships. For the visual component, RMCB systematically incorporates both absolute and relative spatial relational information as contextual cues for image tokens, refining dependency modeling and strengthening inferential reasoning to alleviate biases caused by limited contextual understanding. The model's effectiveness was evaluated on benchmark datasets VQA-v2 and CLEVR, achieving state-of-the-art results with accuracies of 71.78% and 99.27%, respectively. These results underscore RMCB's capability to effectively address comprehension biases while advancing the relational reasoning needed for VQA.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] ConceptBert: Concept-Aware Representation for Visual Question Answering
    Garderes, Francois
    Ziaeefard, Maryam
    Abeloos, Baptiste
    Lecue, Freddy
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 489 - 498
  • [22] Visual question answering with gated relation-aware auxiliary
    Shao, Xiangjun
    Xiang, Zhenglong
    Li, Yuanxiang
    IET IMAGE PROCESSING, 2022, 16 (05) : 1424 - 1432
  • [23] Towards Socially Responsible AI: Cognitive Bias-Aware Multi-Objective Learning
    Sen, Procheta
    Ganguly, Debasis
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 2685 - 2692
  • [24] VQA as a factoid question answering problem: A novel approach for knowledge-aware and explainable visual question answering
    Narayanan, Abhishek
    Rao, Abijna
    Prasad, Abhishek
    Natarajan, S.
    IMAGE AND VISION COMPUTING, 2021, 116
  • [25] Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
    Dancette, Corentin
    Cadene, Remi
    Teney, Damien
    Cord, Matthieu
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1554 - 1563
  • [26] CAAN: Context-Aware attention network for visual question answering
    Chen, Chongqing
    Han, Dezhi
    Chang, Chin-Chen
    Pattern Recognition, 2022, 132
  • [27] Language-aware Visual Semantic Distillation for Video Question Answering
    Zou, Bo
    Yang, Chao
    Qiao, Yu
    Quan, Chengbin
    Zhao, Youjian
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27103 - 27113
  • [28] Boosting Visual Question Answering with Context-aware Knowledge Aggregation
    Li, Guohao
    Wang, Xin
    Zhu, Wenwu
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1227 - 1235
  • [29] Semantic-Aware Modular Capsule Routing for Visual Question Answering
    Han, Yudong
    Yin, Jianhua
    Wu, Jianlong
    Wei, Yinwei
    Nie, Liqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5537 - 5549
  • [30] Relation-Aware Image Captioning for Explainable Visual Question Answering
    Tseng, Ching-Shan
    Lin, Ying-Jia
    Kao, Hung-Yu
    2022 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, TAAI, 2022, : 149 - 154