VQA-BC: ROBUST VISUAL QUESTION ANSWERING VIA BIDIRECTIONAL CHAINING

被引:2
|
作者
Lao, Mingrui [1 ]
Guo, Yanming [2 ]
Chen, Wei [1 ]
Pu, Nan [1 ]
Lew, Michael S. [1 ]
机构
[1] Leiden Univ, LIACS Medialab, Leiden, Netherlands
[2] Natl Univ Def Technol, Coll Syst Engn, Changsha, Peoples R China
关键词
Visual question answering; language bias; forward/backward chaining; label smoothing;
D O I
10.1109/ICASSP43922.2022.9746493
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Current VQA models are suffering from the problem of over-dependence on language bias, which severely reduces their robustness in real-world scenarios. In this paper, we analyze VQA models from the view of forward/backward chaining in the inference engine, and propose to enhance their robustness via a novel Bidirectional Chaining (VQA-BC) framework. Specifically, we introduce a backward chaining with hard-negative contrastive learning to reason from the consequence (answers) to generate crucial known facts (question-related visual region features). Furthermore, to alleviate the over-confident problem in answer prediction (forward chaining), we present a novel introspective regularization to connect forward and backward chaining with label smoothing. Extensive experiments verify that VQA-BC not only effectively overcomes language bias on out-of-distribution dataset, but also alleviates the over-correct problem caused by ensemble-based method on in-distribution dataset. Compared with competitive debiasing strategies, our method achieves state-of-the-art performance to reduce language bias on VQA-CP v2 dataset.
引用
收藏
页码:4833 / 4837
页数:5
相关论文
共 50 条
  • [41] Bidirectional cascaded multimodal attention for multiple choice visual question answering
    Sushmita Upadhyay
    Sanjaya Shankar Tripathy
    Machine Vision and Applications, 2025, 36 (2)
  • [42] A CASCADED LONG SHORT-TERM MEMORY (LSTM) DRIVEN GENERIC VISUAL QUESTION ANSWERING (VQA)
    Chowdhury, Iqbal
    Kien Nguyen
    Fookes, Clinton
    Sridharan, Sridha
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 1842 - 1846
  • [43] Self-Critical Reasoning for Robust Visual Question Answering
    Wu, Jialin
    Mooney, Raymond J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [44] Fair-VQA: Fairness-Aware Visual Question Answering Through Sensitive Attribute Prediction
    Park, Sungho
    Hwang, Sunhee
    Hong, Jongkwang
    Byun, Hyeran
    IEEE ACCESS, 2020, 8 : 215091 - 215099
  • [45] SQT: Debiased Visual Question Answering via Shuffling Question Types
    Huai, Tianyu
    Yang, Shuwen
    Zhang, Junhang
    Wang, Guoan
    Yu, Xinru
    Ma, Tianlong
    He, Liang
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 600 - 605
  • [46] Robust Visual Question Answering: Datasets, Methods, and Future Challenges
    Ma, Jie
    Wang, Pinghui
    Kong, Dechen
    Wang, Zewei
    Liu, Jun
    Pei, Hongbin
    Zhao, Junzhou
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5575 - 5594
  • [47] Modular Visual Question Answering via Code Generation
    Subramanian, Sanjay
    Narasimhan, Medhini
    Khangaonkar, Kushal
    Yang, Kevin
    Nagrani, Arsha
    Schmid, Cordelia
    Zeng, Andy
    Darrell, Trevor
    Klein, Dan
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 747 - 761
  • [48] Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
    Chen, Long
    Zheng, Yuhang
    Niu, Yulei
    Zhang, Hanwang
    Xiao, Jun
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13218 - 13234
  • [49] Exploring and exploiting model uncertainty for robust visual question answering
    School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
    不详
    不详
    不详
    Multimedia Syst, 6 (6):
  • [50] Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering
    Liang, Zujie
    Jiang, Weitao
    Hu, Haifeng
    Zhu, Jiaying
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3285 - 3292