Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool

被引:17
|
作者
Liu, Feng [1 ]
Xiang, Tao [2 ]
Hospedales, Timothy M. [3 ]
Yang, Wankou [1 ]
Sun, Changyin [1 ]
机构
[1] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China
[2] Queen Mary Univ London, Sch Elect Engn & Comp Sci, Comp Vis & Multimedia, London E1 4NS, England
[3] Univ Edinburgh, Sch Informat, IPAB, 10 Crichton St, Edinburgh EH8 9AB, Midlothian, Scotland
关键词
Benchmark testing; Visualization; Predictive models; Analytical models; Image color analysis; Knowledge discovery; Task analysis; Inverse visual question answering; VQA visualisation; visuo-linguistic understanding; reinforcement learning; NETWORKS;
D O I
10.1109/TPAMI.2018.2880185
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, visual question answering (VQA) has become topical. The premise of VQA's significance as a benchmark in AI, is that both the image and textual question need to be well understood and mutually grounded in order to infer the correct answer. However, current VQA models perhaps 'understand' less than initially hoped, and instead master the easier task of exploiting cues given away in the question and biases in the answer distribution [1] . In this paper we propose the inverse problem of VQA (iVQA). The iVQA task is to generate a question that corresponds to a given image and answer pair. We propose a variational iVQA model that can generate diverse, grammatically correct and content correlated questions that match the given answer. Based on this model, we show that iVQA is an interesting benchmark for visuo-linguistic understanding, and a more challenging alternative to VQA because an iVQA model needs to understand the image better to be successful. As a second contribution, we show how to use iVQA in a novel reinforcement learning framework to diagnose any existing VQA model by way of exposing its belief set: the set of question-answer pairs that the VQA model would predict true for a given image. This provides a completely new window into what VQA models 'believe' about images. We show that existing VQA models have more erroneous beliefs than previously thought, revealing their intrinsic weaknesses. Suggestions are then made on how to address these weaknesses going forward.
引用
收藏
页码:460 / 474
页数:15
相关论文
共 50 条
  • [21] Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering
    Jain, Aman
    Kothyari, Mayank
    Kumar, Vishwajeet
    Jyothi, Preethi
    Ramakrishnan, Ganesh
    Chakrabarti, Soumen
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2491 - 2498
  • [22] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
    Goyal, Yash
    Khot, Tejas
    Agrawal, Aishwarya
    Summers-Stay, Douglas
    Batra, Dhruv
    Parikh, Devi
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (04) : 398 - 414
  • [23] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
    Goyal, Yash
    Khot, Tejas
    Summers-Stay, Douglas
    Batra, Dhruv
    Parikh, Devi
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6325 - 6334
  • [24] Towards Video Text Visual Question Answering: Benchmark and Baseline
    Zhao, Minyi
    Li, Bingjia
    Wang, Jie
    Li, Wanqing
    Zhou, Wenjing
    Zhang, Lan
    Xuyang, Shijie
    Yu, Zhihang
    Yu, Xinkun
    Li, Guangze
    Dai, Aobotao
    Zhou, Shuigeng
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [25] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
    Yash Goyal
    Tejas Khot
    Aishwarya Agrawal
    Douglas Summers-Stay
    Dhruv Batra
    Devi Parikh
    [J]. International Journal of Computer Vision, 2019, 127 : 398 - 414
  • [26] ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
    Masry, Ahmed
    Long, Do Xuan
    Tan, Jia Qing
    Joty, Shafiq
    Hogue, Enamul
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2263 - 2279
  • [27] From image to language: A critical analysis of Visual Question Answering (VQA) approaches, challenges, and opportunities
    Ishmam, Md. Farhan
    Shovon, Md. Sakib Hossain
    Mridha, M. F.
    Dey, Nilanjan
    [J]. INFORMATION FUSION, 2024, 106
  • [28] Regulating Balance Degree for More Reasonable Visual Question Answering Benchmark
    Lin, Ken
    Mao, Aihua
    Liu, Jiangfeng
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [29] A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge
    Schwenk, Dustin
    Khandelwal, Apoorv
    Clark, Christopher
    Marino, Kenneth
    Mottaghi, Roozbeh
    [J]. COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 146 - 162
  • [30] Feasibility of Visual Question Answering (VQA) for Post-Disaster Damage Detection Using Aerial Footage
    Lowande, Rafael De Sa
    Sevil, Hakki Erhan
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (08):