Robust Visual Question Answering: Datasets, Methods, and Future Challenges

被引:0
|
作者
Ma, Jie [1 ]
Wang, Pinghui [1 ]
Kong, Dechen
Wang, Zewei
Liu, Jun
Pei, Hongbin
Zhao, Junzhou
机构
[1] Jiaotong Univ, Sch Cyber Sci & Engn, Key Lab Intelligent Networks & Network Secur, Minist Educ, Xian 710049, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Sports; Task analysis; Robustness; Transformers; Training; Question answering (information retrieval); Knowledge engineering; Vision-and-language pre-training; bias learning; debiasing; multi-modality learning; visual question answering; ATTENTION;
D O I
10.1109/TPAMI.2024.3366154
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual question answering requires a system to provide an accurate natural language answer given an image and a natural language question. However, it is widely recognized that previous generic VQA methods often tend to memorize biases present in the training data rather than learning proper behaviors, such as grounding images before predicting answers. Therefore, these methods usually achieve high in-distribution but poor out-of-distribution performance. In recent years, various datasets and debiasing methods have been proposed to evaluate and enhance the VQA robustness, respectively. This paper provides the first comprehensive survey focused on this emerging fashion. Specifically, we first provide an overview of the development process of datasets from in-distribution and out-of-distribution perspectives. Then, we examine the evaluation metrics employed by these datasets. Third, we propose a typology that presents the development process, similarities and differences, robustness comparison, and technical features of existing debiasing methods. Furthermore, we analyze and discuss the robustness of representative vision-and-language pre-training models on VQA. Finally, through a thorough review of the available literature and experimental analysis, we discuss the key areas for future research from various viewpoints.
引用
收藏
页码:5575 / 5594
页数:20
相关论文
共 50 条
  • [1] Visual question answering: Datasets, algorithms, and future challenges
    Kafle, Kushal
    Kanan, Christopher
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2017, 163 : 3 - 20
  • [2] Counting in Visual Question Answering: Methods, Datasets, and Future Work
    Welde, Tesfayee Meshu
    Liao, Lejian
    [J]. INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2023,
  • [3] Visual question answering: A survey of methods and datasets
    Wu, Qi
    Teney, Damien
    Wang, Peng
    Shen, Chunhua
    Dick, Anthony
    van den Hengel, Anton
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2017, 163 : 21 - 40
  • [4] A survey of methods, datasets and evaluation metrics for visual question answering
    Sharma, Himanshu
    Jalal, Anand Singh
    [J]. IMAGE AND VISION COMPUTING, 2021, 116
  • [5] Robust Explanations for Visual Question Answering
    Patro, Badri N.
    Patel, Shivansh
    Namboodiri, Vinay P.
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1566 - 1575
  • [6] Biomedical Question Answering: A Survey of Methods and Datasets
    Kaddari, Zakaria
    Mellah, Youssef
    Berrich, Jamal
    Bouchentouf, Toumi
    Belkasmi, Mohammed G.
    [J]. 2020 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS), 2020,
  • [7] Generative Bias for Robust Visual Question Answering
    Cho, Jae Won
    Kim, Dong-Jin
    Ryu, Hyeonggon
    Kweon, In So
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11681 - 11690
  • [8] What is in the KGQA Benchmark Datasets? Survey on Challenges in Datasets for Question Answering on Knowledge Graphs
    Steinmetz, Nadine
    Sattler, Kai-Uwe
    [J]. JOURNAL ON DATA SEMANTICS, 2021, 10 (3-4) : 241 - 265
  • [9] Cycle-Consistency for Robust Visual Question Answering
    Shah, Meet
    Chen, Xinlei
    Rohrbach, Marcus
    Parikh, Devi
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6642 - 6651
  • [10] Rethinking Data Augmentation for Robust Visual Question Answering
    Chen, Long
    Zheng, Yuhang
    Xiao, Jun
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 95 - 112