iVQA: Inverse Visual Question Answering

被引:24
|
作者
Liu, Feng [1 ]
Xiang, Tao [2 ]
Hospedales, Timothy M. [3 ]
Yang, Wankou [1 ]
Sun, Changyin [1 ]
机构
[1] Southeast Univ, Nanjing, Jiangsu, Peoples R China
[2] Queen Mary Univ London, London, England
[3] Univ Edinburgh, Edinburgh, Midlothian, Scotland
关键词
D O I
10.1109/CVPR.2018.00898
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose the inverse problem of Visual question answering (iVQA), and explore its suitability as a benchmark for visuo-linguistic understanding. The iVQA task is to generate a question that corresponds to a given image and answer pair. Since the answers are less informative than the questions, and the questions have less learnable bias, an iVQA model needs to better understand the image to be successful than a VQA model. We pose question generation as a multi-modal dynamic inference process and propose an iVQA model that can gradually adjust its focus of attention guided by both a partially generated question and the answer. For evaluation, apart from existing linguistic metrics, we propose a new ranking metric. This metric compares the ground truth question's rank among a list of distractors, which allows the drawbacks of different algorithms and sources of error to be studied. Experimental results show that our model can generate diverse, grammatically correct and content correlated questions that match the given answer
引用
收藏
页码:8611 / 8619
页数:9
相关论文
共 50 条
  • [21] Multi-Question Learning for Visual Question Answering
    Lei, Chenyi
    Wu, Lei
    Liu, Dong
    Li, Zhao
    Wang, Guoxin
    Tang, Haihong
    Li, Houqiang
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11328 - 11335
  • [22] Differential Attention for Visual Question Answering
    Patro, Badri
    Namboodiri, Vinay P.
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7680 - 7688
  • [23] Structured Attentions for Visual Question Answering
    Zhu, Chen
    Zhao, Yanpeng
    Huang, Shuaiyi
    Tu, Kewei
    Ma, Yi
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1300 - 1309
  • [24] An Analysis of Visual Question Answering Algorithms
    Kafle, Kushal
    Kanan, Christopher
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1983 - 1991
  • [25] Multimodal Attention for Visual Question Answering
    Kodra, Lorena
    Mece, Elinda Kajo
    [J]. INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 783 - 792
  • [26] Affective Visual Question Answering Network
    Ruwa, Nelson
    Mao, Qirong
    Wang, Liangjun
    Dong, Ming
    [J]. IEEE 1ST CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2018), 2018, : 170 - 173
  • [27] Question-Agnostic Attention for Visual Question Answering
    Farazi, Moshiur
    Khan, Salman
    Barnes, Nick
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3542 - 3549
  • [28] Revisiting Visual Question Answering Baselines
    Jabri, Allan
    Joulin, Armand
    van der Maaten, Laurens
    [J]. COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 : 727 - 739
  • [29] Visual Question Answering as Reading Comprehension
    Li, Hui
    Wang, Peng
    Shen, Chunhua
    van den Hengel, Anton
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6312 - 6321
  • [30] Answer Distillation for Visual Question Answering
    Fang, Zhiwei
    Liu, Jing
    Tang, Qu
    Li, Yong
    Lu, Hanqing
    [J]. COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 72 - 87