Overcoming Language Priors with Self-supervised Learning for Visual Question Answering

被引:0
|
作者
Zhi, Xi [1 ,2 ]
Mao, Zhendong [3 ]
Liu, Chunxiao [1 ,2 ]
Zhang, Peng [1 ]
Wang, Bin [4 ]
Zhang, Yongdong [3 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Univ Sci & Technol China, Hefei, Peoples R China
[4] Xiaomi Inc, Xiaomi AI Lab, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most Visual Question Answering (VQA) models suffer from the language prior problem, which is caused by inherent data biases. Specifically, VQA models tend to answer questions (e.g., what color is the banana?) based on the high-frequency answers (e.g., yellow) ignoring image contents. Existing approaches tackle this problem by creating delicate models or introducing additional visual annotations to reduce question dependency and strengthen image dependency. However, they are still subject to the language prior problem since the data biases have not been fundamentally addressed. In this paper, we introduce a self-supervised learning framework to solve this problem. Concretely, we first automatically generate labeled data to balance the biased data, and then propose a self-supervised auxiliary task to utilize the balanced data to assist the VQA model to overcome language priors. Our method can compensate for the data biases by generating balanced data without introducing external annotations. Experimental results show that our method achieves state-of-the-art performance, improving the overall accuracy from 49.50% to 57.59% on the most commonly used benchmark VQA-CP v2. In other words, we can increase the performance of annotation-based methods by 16% without using external annotations. Our code is available on GitHub(1).
引用
收藏
页码:1083 / 1089
页数:7
相关论文
共 50 条
  • [1] Overcoming language priors with self-contrastive learning for visual question answering
    Hong Yan
    Lijun Liu
    Xupeng Feng
    Qingsong Huang
    [J]. Multimedia Tools and Applications, 2023, 82 : 16343 - 16358
  • [2] Overcoming language priors with self-contrastive learning for visual question answering
    Yan, Hong
    Liu, Lijun
    Feng, Xupeng
    Huang, Qingsong
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (11) : 16343 - 16358
  • [3] Overcoming language priors in visual question answering with cumulative learning strategy
    Mao, Aihua
    Chen, Feng
    Ma, Ziying
    Lin, Ken
    [J]. Neurocomputing, 2024, 608
  • [4] Overcoming Language Priors in Visual Question Answering with Adversarial Regularization
    Ramakrishnan, Sainandan
    Agrawal, Aishwarya
    Lee, Stefan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [5] Overcoming Language Priors with Counterfactual Inference for Visual Question Answering
    Ren, Zhibo
    Wang, Huizhen
    Zhu, Muhua
    Wang, Yichao
    Xiao, Tong
    Zhu, Jingbo
    [J]. CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 58 - 71
  • [6] SELF-SUPERVISED VISION-LANGUAGE PRETRAINING FOR MEDIAL VISUAL QUESTION ANSWERING
    Li, Pengfei
    Liu, Gang
    Tan, Lin
    Liao, Jinying
    Zhong, Shenjun
    [J]. 2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [7] ASCL: Adaptive self-supervised counterfactual learning for robust visual question answering
    Shu, Xinyao
    Yan, Shiyang
    Yang, Xu
    Wu, Ziheng
    Chen, Zhongfeng
    Lu, Zhenyu
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [8] Simple contrastive learning in a self-supervised manner for robust visual question answering
    Yang, Shuwen
    Xiao, Luwei
    Wu, Xingjiao
    Xu, Junjie
    Wang, Linlin
    He, Liang
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 241
  • [9] elBERto: Self-supervised commonsense learning for question answering
    Zhan, Xunlin
    Li, Yuan
    Dong, Xiao
    Liang, Xiaodan
    Hu, Zhiting
    Carin, Lawrence
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 258
  • [10] Overcoming Language Priors via Shuffling Language Bias for Robust Visual Question Answering
    Zhao, J.
    Yu, Z.
    Zhang, X.
    Yang, Y.
    [J]. IEEE ACCESS, 2023, 11 : 85980 - 85989