Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool

被引:17
|
作者
Liu, Feng [1 ]
Xiang, Tao [2 ]
Hospedales, Timothy M. [3 ]
Yang, Wankou [1 ]
Sun, Changyin [1 ]
机构
[1] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China
[2] Queen Mary Univ London, Sch Elect Engn & Comp Sci, Comp Vis & Multimedia, London E1 4NS, England
[3] Univ Edinburgh, Sch Informat, IPAB, 10 Crichton St, Edinburgh EH8 9AB, Midlothian, Scotland
关键词
Benchmark testing; Visualization; Predictive models; Analytical models; Image color analysis; Knowledge discovery; Task analysis; Inverse visual question answering; VQA visualisation; visuo-linguistic understanding; reinforcement learning; NETWORKS;
D O I
10.1109/TPAMI.2018.2880185
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, visual question answering (VQA) has become topical. The premise of VQA's significance as a benchmark in AI, is that both the image and textual question need to be well understood and mutually grounded in order to infer the correct answer. However, current VQA models perhaps 'understand' less than initially hoped, and instead master the easier task of exploiting cues given away in the question and biases in the answer distribution [1] . In this paper we propose the inverse problem of VQA (iVQA). The iVQA task is to generate a question that corresponds to a given image and answer pair. We propose a variational iVQA model that can generate diverse, grammatically correct and content correlated questions that match the given answer. Based on this model, we show that iVQA is an interesting benchmark for visuo-linguistic understanding, and a more challenging alternative to VQA because an iVQA model needs to understand the image better to be successful. As a second contribution, we show how to use iVQA in a novel reinforcement learning framework to diagnose any existing VQA model by way of exposing its belief set: the set of question-answer pairs that the VQA model would predict true for a given image. This provides a completely new window into what VQA models 'believe' about images. We show that existing VQA models have more erroneous beliefs than previously thought, revealing their intrinsic weaknesses. Suggestions are then made on how to address these weaknesses going forward.
引用
收藏
页码:460 / 474
页数:15
相关论文
共 50 条
  • [1] VQA: Visual Question Answering
    Antol, Stanislaw
    Agrawal, Aishwarya
    Lu, Jiasen
    Mitchell, Margaret
    Batra, Dhruv
    Zitnick, C. Lawrence
    Parikh, Devi
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
  • [2] VQA: Visual Question Answering
    Agrawal, Aishwarya
    Lu, Jiasen
    Antol, Stanislaw
    Mitchell, Margaret
    Zitnick, C. Lawrence
    Parikh, Devi
    Batra, Dhruv
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) : 4 - 31
  • [3] OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
    Marino, Kenneth
    Rastegari, Mohammad
    Farhadi, Ali
    Mottaghi, Roozbeh
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3190 - 3199
  • [4] Event-Oriented Visual Question Answering: The E-VQA Dataset and Benchmark
    Yang, Zhenguo
    Xiang, Jiale
    You, Jiuxiang
    Li, Qing
    Liu, Wenyin
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (10) : 10210 - 10223
  • [5] RESCUENET-VQA: A LARGE-SCALE VISUAL QUESTION ANSWERING BENCHMARK FOR DAMAGE ASSESSMENT
    Sarkar, Argho
    Rahnemoonfar, Maryam
    [J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 1150 - 1153
  • [6] VC-VQA: VISUAL CALIBRATION MECHANISM FOR VISUAL QUESTION ANSWERING
    Qiao, Yanyuan
    Yu, Zheng
    Liu, Jing
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1481 - 1485
  • [7] CQ-VQA: Visual Question Answering on Categorized Questions
    Mishra, Aakansha
    Anand, Ashish
    Guha, Prithwijit
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [8] CS-VQA: VISUAL QUESTION ANSWERING WITH COMPRESSIVELY SENSED IMAGES
    Huang, Li-Chi
    Kulkarni, Kuldeep
    Jha, Anik
    Lohit, Suhas
    Jayasuriya, Suren
    Turaga, Pavan
    [J]. 2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 1283 - 1287
  • [9] iVQA: Inverse Visual Question Answering
    Liu, Feng
    Xiang, Tao
    Hospedales, Timothy M.
    Yang, Wankou
    Sun, Changyin
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8611 - 8619
  • [10] VQA as a factoid question answering problem: A novel approach for knowledge-aware and explainable visual question answering
    Narayanan, Abhishek
    Rao, Abijna
    Prasad, Abhishek
    Natarajan, S.
    [J]. IMAGE AND VISION COMPUTING, 2021, 116