Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering

被引:0
|
作者
Liang, Zujie [1 ]
Jiang, Weitao [1 ]
Hu, Haifeng [1 ]
Zhu, Jiaying [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the task of Visual Question Answering (VQA), most state-of-the-art models tend to learn spurious correlations in the training set and achieve poor performance in out-of-distribution test data. Some methods of generating counterfactual samples have been proposed to alleviate this problem. However, the counterfactual samples generated by most previous methods are simply added to the training data for augmentation and are not fully utilized. Therefore, we introduce a novel self-supervised contrastive learning mechanism to learn the relationship between original samples, factual samples and counterfactual samples. With the better cross-modal joint embeddings learned from the auxiliary training objective, the reasoning capability and robustness of the VQA model are boosted significantly. We evaluate the effectiveness of our method by surpassing current state-of-the-art models on the VQA-CP dataset, a diagnostic benchmark for assessing the VQA model's robustness.
引用
收藏
页码:3285 / 3292
页数:8
相关论文
共 50 条
  • [1] Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
    Chen, Long
    Zheng, Yuhang
    Niu, Yulei
    Zhang, Hanwang
    Xiao, Jun
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13218 - 13234
  • [2] Robust Visual Question Answering Based on Counterfactual Samples and Relationship Perception
    Qin, Hong
    An, Gaoyun
    Ruan, Qiuqi
    [J]. IMAGE AND GRAPHICS TECHNOLOGIES AND APPLICATIONS, IGTA 2021, 2021, 1480 : 145 - 158
  • [3] ASCL: Adaptive self-supervised counterfactual learning for robust visual question answering
    Shu, Xinyao
    Yan, Shiyang
    Yang, Xu
    Wu, Ziheng
    Chen, Zhongfeng
    Lu, Zhenyu
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [4] Efficient Counterfactual Debiasing for Visual Question Answering
    Kolling, Camila
    More, Martin
    Gavenski, Nathan
    Pooch, Eduardo
    Parraga, Otavio
    Barros, Rodrigo C.
    [J]. 2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2572 - 2581
  • [5] Robust visual question answering via polarity enhancement and contrast *
    Peng, Dahe
    Li, Zhixin
    [J]. NEURAL NETWORKS, 2024, 179
  • [6] Counterfactual Mix-Up for Visual Question Answering
    Cho, Jae Won
    Kim, Dong-Jin
    Jung, Yunjae
    Kweon, In So
    [J]. IEEE ACCESS, 2023, 11 : 95201 - 95212
  • [7] Robust Explanations for Visual Question Answering
    Patro, Badri N.
    Patel, Shivansh
    Namboodiri, Vinay P.
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1566 - 1575
  • [8] Debiasing Medical Visual Question Answering via Counterfactual Training
    Zhan, Chenlu
    Peng, Peng
    Zhang, Hanrong
    Sun, Haiyue
    Shang, Chunnan
    Chen, Tao
    Wang, Hongsen
    Wang, Gaoang
    Wang, Hongwei
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT II, 2023, 14221 : 382 - 393
  • [9] COIN: Counterfactual Image Generation for Visual Question Answering Interpretation
    Boukhers, Zeyd
    Hartmann, Timo
    Juerjens, Jan
    [J]. SENSORS, 2022, 22 (06)
  • [10] Overcoming Language Priors with Counterfactual Inference for Visual Question Answering
    Ren, Zhibo
    Wang, Huizhen
    Zhu, Muhua
    Wang, Yichao
    Xiao, Tong
    Zhu, Jingbo
    [J]. CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 58 - 71