HCCL: Hierarchical Counterfactual Contrastive Learning for Robust Visual Question Answering

被引:0
|
作者
Hao, Dongze [1 ,2 ]
Wang, Qunbo [1 ]
Zhu, Xinxin [1 ]
Liu, Jing [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Lab Cognit & Decis Intelligence Complex Syst, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual question answering; hierarchical counterfactual contrastive learning; robust VQA;
D O I
10.1145/3673902
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Despite most state-of-the-art models having achieved amazing performance in Visual Question Answering (VQA), they usually utilize biases to answer the question. Recently, some studies synthesize counterfactual training samples to help the model to mitigate the biases. However, these synthetic samples need extra annotations and often contain noises. Moreover, these methods simply add synthetic samples to the training data to train the model with the cross-entropy loss, which cannot make the best use of synthetic samples to mitigate the biases. In this article, to mitigate the biases in VQA more effectively, we propose a Hierarchical Counterfactual Contrastive Learning (HCCL) method. Firstly, to avoid introducing noises and extra annotations, our method automatically masks the unimportant features in original pairs to obtain positive samples and create mismatched question-image pairs as negative samples. Then our method uses feature-level and answer-level contrastive learning to make the original sample close to positive samples in the feature space, while away from negative samples in both feature and answer spaces. In this way, the VQA model can learn the robust multimodal features and focus on both visual and language information to produce the answer. Our HCCL method can be adopted in different baselines, and the experimental results on VQA v2, VQA-CP, and GQA-OOD datasets show that our method is effective in mitigating the biases in VQA, which improves the robustness of the VQA model.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] A Survey on Representation Learning in Visual Question Answering
    Sahani, Manish
    Singh, Priyadarshan
    Jangpangi, Sachin
    Kumar, Shailender
    MACHINE LEARNING AND BIG DATA ANALYTICS (PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND BIG DATA ANALYTICS (ICMLBDA) 2021), 2022, 256 : 326 - 336
  • [42] Multimodal Learning and Reasoning for Visual Question Answering
    Ilievski, Ilija
    Feng, Jiashi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [43] Visual Question Answering as a Meta Learning Task
    Teney, Damien
    van den Hengel, Anton
    COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 229 - 245
  • [44] Selective residual learning for Visual Question Answering
    Hong, Jongkwang
    Park, Sungho
    Byun, Hyeran
    NEUROCOMPUTING, 2020, 402 : 366 - 374
  • [45] Hierarchical Question-Image Co-Attention for Visual Question Answering
    Lu, Jiasen
    Yang, Jianwei
    Batra, Dhruv
    Parikh, Devi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [46] Visual Question Answering using Hierarchical Dynamic Memory Networks
    Shang, Jiayu
    Li, Shiren
    Duan, Zhikui
    Huang, Junwei
    NINTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2017), 2018, 10615
  • [47] Learning Visual Knowledge Memory Networks for Visual Question Answering
    Su, Zhou
    Zhu, Chen
    Dong, Yinpeng
    Cai, Dongqi
    Chen, Yurong
    Li, Jianguo
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7736 - 7745
  • [48] Knowledge Graph Question Answering based on Contrastive Learning and Feature Transformation
    Hu, Xinrong
    Huang, Jingjing
    Liu, Junping
    Zhu, Qiang
    Yang, Jie
    2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY, AND SECURITY COMPANION, QRS-C, 2022, : 608 - 615
  • [49] Contrastive Representation Learning for Conversational Question Answering over Knowledge Graphs
    Kacupaj, Endri
    Singh, Kuldeep
    Maleshkova, Maria
    Lehmann, Jens
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 925 - 934
  • [50] Self-supervised Graph Contrastive Learning for Video Question Answering
    Yao X.
    Gao J.-Y.
    Xu C.-S.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2083 - 2100