VQA-BC: ROBUST VISUAL QUESTION ANSWERING VIA BIDIRECTIONAL CHAINING

被引:2
|
作者
Lao, Mingrui [1 ]
Guo, Yanming [2 ]
Chen, Wei [1 ]
Pu, Nan [1 ]
Lew, Michael S. [1 ]
机构
[1] Leiden Univ, LIACS Medialab, Leiden, Netherlands
[2] Natl Univ Def Technol, Coll Syst Engn, Changsha, Peoples R China
关键词
Visual question answering; language bias; forward/backward chaining; label smoothing;
D O I
10.1109/ICASSP43922.2022.9746493
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Current VQA models are suffering from the problem of over-dependence on language bias, which severely reduces their robustness in real-world scenarios. In this paper, we analyze VQA models from the view of forward/backward chaining in the inference engine, and propose to enhance their robustness via a novel Bidirectional Chaining (VQA-BC) framework. Specifically, we introduce a backward chaining with hard-negative contrastive learning to reason from the consequence (answers) to generate crucial known facts (question-related visual region features). Furthermore, to alleviate the over-confident problem in answer prediction (forward chaining), we present a novel introspective regularization to connect forward and backward chaining with label smoothing. Extensive experiments verify that VQA-BC not only effectively overcomes language bias on out-of-distribution dataset, but also alleviates the over-correct problem caused by ensemble-based method on in-distribution dataset. Compared with competitive debiasing strategies, our method achieves state-of-the-art performance to reduce language bias on VQA-CP v2 dataset.
引用
收藏
页码:4833 / 4837
页数:5
相关论文
共 50 条
  • [1] VQA: Visual Question Answering
    Antol, Stanislaw
    Agrawal, Aishwarya
    Lu, Jiasen
    Mitchell, Margaret
    Batra, Dhruv
    Zitnick, C. Lawrence
    Parikh, Devi
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
  • [2] VQA: Visual Question Answering
    Agrawal, Aishwarya
    Lu, Jiasen
    Antol, Stanislaw
    Mitchell, Margaret
    Zitnick, C. Lawrence
    Parikh, Devi
    Batra, Dhruv
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) : 4 - 31
  • [3] R-VQA: A robust visual question answering model
    Chowdhury, Souvik
    Soni, Badal
    Knowledge-Based Systems, 2025, 309
  • [4] VC-VQA: VISUAL CALIBRATION MECHANISM FOR VISUAL QUESTION ANSWERING
    Qiao, Yanyuan
    Yu, Zheng
    Liu, Jing
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1481 - 1485
  • [5] CQ-VQA: Visual Question Answering on Categorized Questions
    Mishra, Aakansha
    Anand, Ashish
    Guha, Prithwijit
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [6] Robust visual question answering via polarity enhancement and contrast *
    Peng, Dahe
    Li, Zhixin
    NEURAL NETWORKS, 2024, 179
  • [7] CS-VQA: VISUAL QUESTION ANSWERING WITH COMPRESSIVELY SENSED IMAGES
    Huang, Li-Chi
    Kulkarni, Kuldeep
    Jha, Anik
    Lohit, Suhas
    Jayasuriya, Suren
    Turaga, Pavan
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 1283 - 1287
  • [8] Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool
    Liu, Feng
    Xiang, Tao
    Hospedales, Timothy M.
    Yang, Wankou
    Sun, Changyin
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) : 460 - 474
  • [9] VQA as a factoid question answering problem: A novel approach for knowledge-aware and explainable visual question answering
    Narayanan, Abhishek
    Rao, Abijna
    Prasad, Abhishek
    Natarajan, S.
    IMAGE AND VISION COMPUTING, 2021, 116
  • [10] SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering
    Xiong, Peixi
    You, Quanzeng
    Yu, Pei
    Liu, Zicheng
    Wu, Ying
    arXiv, 2022,