VQA-BC: ROBUST VISUAL QUESTION ANSWERING VIA BIDIRECTIONAL CHAINING

被引:2
|
作者
Lao, Mingrui [1 ]
Guo, Yanming [2 ]
Chen, Wei [1 ]
Pu, Nan [1 ]
Lew, Michael S. [1 ]
机构
[1] Leiden Univ, LIACS Medialab, Leiden, Netherlands
[2] Natl Univ Def Technol, Coll Syst Engn, Changsha, Peoples R China
关键词
Visual question answering; language bias; forward/backward chaining; label smoothing;
D O I
10.1109/ICASSP43922.2022.9746493
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Current VQA models are suffering from the problem of over-dependence on language bias, which severely reduces their robustness in real-world scenarios. In this paper, we analyze VQA models from the view of forward/backward chaining in the inference engine, and propose to enhance their robustness via a novel Bidirectional Chaining (VQA-BC) framework. Specifically, we introduce a backward chaining with hard-negative contrastive learning to reason from the consequence (answers) to generate crucial known facts (question-related visual region features). Furthermore, to alleviate the over-confident problem in answer prediction (forward chaining), we present a novel introspective regularization to connect forward and backward chaining with label smoothing. Extensive experiments verify that VQA-BC not only effectively overcomes language bias on out-of-distribution dataset, but also alleviates the over-correct problem caused by ensemble-based method on in-distribution dataset. Compared with competitive debiasing strategies, our method achieves state-of-the-art performance to reduce language bias on VQA-CP v2 dataset.
引用
收藏
页码:4833 / 4837
页数:5
相关论文
共 50 条
  • [21] R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering
    Lu, Pan
    Ji, Lei
    Zhang, Wei
    Duan, Nan
    Zhou, Ming
    Wang, Jianyong
    KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1880 - 1889
  • [22] Adversarial Learning with Bidirectional Attention for Visual Question Answering
    Li, Qifeng
    Tang, Xinyi
    Jian, Yi
    SENSORS, 2021, 21 (21)
  • [23] Cross Modality Bias in Visual Question Answering: A Causal View With Possible Worlds VQA
    Vosoughi, Ali
    Deng, Shijian
    Zhang, Songyang
    Tian, Yapeng
    Xu, Chenliang
    Luo, Jiebo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8609 - 8624
  • [24] Generative Bias for Robust Visual Question Answering
    Cho, Jae Won
    Kim, Dong-Jin
    Ryu, Hyeonggon
    Kweon, In So
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11681 - 11690
  • [25] Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering
    Naik, Nandita
    Potts, Christopher
    Kreiss, Elisa
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2813 - 2817
  • [26] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
    Goyal, Yash
    Khot, Tejas
    Agrawal, Aishwarya
    Summers-Stay, Douglas
    Batra, Dhruv
    Parikh, Devi
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (04) : 398 - 414
  • [27] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
    Goyal, Yash
    Khot, Tejas
    Summers-Stay, Douglas
    Batra, Dhruv
    Parikh, Devi
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6325 - 6334
  • [28] BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining
    Kim, MinJun
    Song, SeungWoo
    Lee, YouHan
    Jang, Haneol
    Lim, KyungTae
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18381 - 18389
  • [29] Event-Oriented Visual Question Answering: The E-VQA Dataset and Benchmark
    Yang, Zhenguo
    Xiang, Jiale
    You, Jiuxiang
    Li, Qing
    Liu, Wenyin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (10) : 10210 - 10223
  • [30] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
    Yash Goyal
    Tejas Khot
    Aishwarya Agrawal
    Douglas Summers-Stay
    Dhruv Batra
    Devi Parikh
    International Journal of Computer Vision, 2019, 127 : 398 - 414