HCCL: Hierarchical Counterfactual Contrastive Learning for Robust Visual Question Answering

被引:0
|
作者
Hao, Dongze [1 ,2 ]
Wang, Qunbo [1 ]
Zhu, Xinxin [1 ]
Liu, Jing [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Lab Cognit & Decis Intelligence Complex Syst, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual question answering; hierarchical counterfactual contrastive learning; robust VQA;
D O I
10.1145/3673902
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Despite most state-of-the-art models having achieved amazing performance in Visual Question Answering (VQA), they usually utilize biases to answer the question. Recently, some studies synthesize counterfactual training samples to help the model to mitigate the biases. However, these synthetic samples need extra annotations and often contain noises. Moreover, these methods simply add synthetic samples to the training data to train the model with the cross-entropy loss, which cannot make the best use of synthetic samples to mitigate the biases. In this article, to mitigate the biases in VQA more effectively, we propose a Hierarchical Counterfactual Contrastive Learning (HCCL) method. Firstly, to avoid introducing noises and extra annotations, our method automatically masks the unimportant features in original pairs to obtain positive samples and create mismatched question-image pairs as negative samples. Then our method uses feature-level and answer-level contrastive learning to make the original sample close to positive samples in the feature space, while away from negative samples in both feature and answer spaces. In this way, the VQA model can learn the robust multimodal features and focus on both visual and language information to produce the answer. Our HCCL method can be adopted in different baselines, and the experimental results on VQA v2, VQA-CP, and GQA-OOD datasets show that our method is effective in mitigating the biases in VQA, which improves the robustness of the VQA model.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering
    Liang, Zujie
    Jiang, Weitao
    Hu, Haifeng
    Zhu, Jiaying
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3285 - 3292
  • [2] Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
    Chen, Long
    Zheng, Yuhang
    Niu, Yulei
    Zhang, Hanwang
    Xiao, Jun
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13218 - 13234
  • [3] ASCL: Adaptive self-supervised counterfactual learning for robust visual question answering
    Shu, Xinyao
    Yan, Shiyang
    Yang, Xu
    Wu, Ziheng
    Chen, Zhongfeng
    Lu, Zhenyu
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [4] Bidirectional Contrastive Split Learning for Visual Question Answering
    Sun, Yuwei
    Ochiai, Hideya
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21602 - 21609
  • [5] Simple contrastive learning in a self-supervised manner for robust visual question answering
    Yang, Shuwen
    Xiao, Luwei
    Wu, Xingjiao
    Xu, Junjie
    Wang, Linlin
    He, Liang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 241
  • [6] Robust Visual Question Answering Based on Counterfactual Samples and Relationship Perception
    Qin, Hong
    An, Gaoyun
    Ruan, Qiuqi
    IMAGE AND GRAPHICS TECHNOLOGIES AND APPLICATIONS, IGTA 2021, 2021, 1480 : 145 - 158
  • [7] Efficient Counterfactual Debiasing for Visual Question Answering
    Kolling, Camila
    More, Martin
    Gavenski, Nathan
    Pooch, Eduardo
    Parraga, Otavio
    Barros, Rodrigo C.
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2572 - 2581
  • [8] Medical Visual Question Answering via Conditional Reasoning and Contrastive Learning
    Liu, Bo
    Zhan, Li-Ming
    Xu, Li
    Wu, Xiao-Ming
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2023, 42 (05) : 1532 - 1545
  • [9] Counterfactual Mix-Up for Visual Question Answering
    Cho, Jae Won
    Kim, Dong-Jin
    Jung, Yunjae
    Kweon, In So
    IEEE ACCESS, 2023, 11 : 95201 - 95212
  • [10] Overcoming language priors with self-contrastive learning for visual question answering
    Hong Yan
    Lijun Liu
    Xupeng Feng
    Qingsong Huang
    Multimedia Tools and Applications, 2023, 82 : 16343 - 16358