Towards bias-aware visual question answering: Rectifying and mitigating comprehension biases

被引：0

作者：

Chen, Chongqing ^{[1
]}

Han, Dezhi ^{[1
]}

Guo, Zihan ^{[2
]}

Chang, Chin-Chen ^{[3
]}

机构：

[1] Shanghai Maritime Univ, Sch Informat Engn, Shanghai 201306, Peoples R China

[2] Changzhi Univ, Dept Comp Sci, Changzhi 046011, Shanxi, Peoples R China

[3] Feng Chia Univ, Dept Informat Engn & Comp Sci, Taichung 407, Taiwan

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2025年 / 264卷

基金：

中国国家自然科学基金; 上海市自然科学基金;

关键词：

Comprehension biases; Relational dependency modeling; Visual question answering (VQA); Inference capability; Contextual information; SELF-ATTENTION NETWORKS; LANGUAGE;

D O I：

10.1016/j.eswa.2024.125817

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformers have become essential for capturing intra- and inter-dependencies in visual question answering (VQA). Yet, challenges remain in overcoming inherent comprehension biases and improving the relational dependency modeling and reasoning capabilities crucial for VQA tasks. This paper presents RMCB, a novel VQA model designed to mitigate these biases by integrating contextual information from both visual and linguistic sources and addressing potential comprehension limitations at each end. RMCB introduces enhanced relational modeling for language tokens by leveraging textual context, addressing comprehension biases arising from the isolated pairwise modeling of token relationships. For the visual component, RMCB systematically incorporates both absolute and relative spatial relational information as contextual cues for image tokens, refining dependency modeling and strengthening inferential reasoning to alleviate biases caused by limited contextual understanding. The model's effectiveness was evaluated on benchmark datasets VQA-v2 and CLEVR, achieving state-of-the-art results with accuracies of 71.78% and 99.27%, respectively. These results underscore RMCB's capability to effectively address comprehension biases while advancing the relational reasoning needed for VQA.

引用

页数：14

共 50 条

[21] ConceptBert: Concept-Aware Representation for Visual Question Answering
Garderes, Francois
Ziaeefard, Maryam
Abeloos, Baptiste
Lecue, Freddy
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 489 - 498
[22] Visual question answering with gated relation-aware auxiliary
Shao, Xiangjun
Xiang, Zhenglong
Li, Yuanxiang
IET IMAGE PROCESSING, 2022, 16 (05) : 1424 - 1432
[23] Towards Socially Responsible AI: Cognitive Bias-Aware Multi-Objective Learning
Sen, Procheta
Ganguly, Debasis
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 2685 - 2692
[24] VQA as a factoid question answering problem: A novel approach for knowledge-aware and explainable visual question answering
Narayanan, Abhishek
Rao, Abijna
Prasad, Abhishek
Natarajan, S.
IMAGE AND VISION COMPUTING, 2021, 116
[25] Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
Dancette, Corentin
Cadene, Remi
Teney, Damien
Cord, Matthieu
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1554 - 1563
[26] CAAN: Context-Aware attention network for visual question answering
Chen, Chongqing
Han, Dezhi
Chang, Chin-Chen
Pattern Recognition, 2022, 132
[27] Language-aware Visual Semantic Distillation for Video Question Answering
Zou, Bo
Yang, Chao
Qiao, Yu
Quan, Chengbin
Zhao, Youjian
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27103 - 27113
[28] Boosting Visual Question Answering with Context-aware Knowledge Aggregation
Li, Guohao
Wang, Xin
Zhu, Wenwu
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1227 - 1235
[29] Semantic-Aware Modular Capsule Routing for Visual Question Answering
Han, Yudong
Yin, Jianhua
Wu, Jianlong
Wei, Yinwei
Nie, Liqiang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5537 - 5549
[30] Relation-Aware Image Captioning for Explainable Visual Question Answering
Tseng, Ching-Shan
Lin, Ying-Jia
Kao, Hung-Yu
2022 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, TAAI, 2022, : 149 - 154

← 1 2 3 4 5 →