VQA: Visual Question Answering

被引:2360
|
作者
Antol, Stanislaw [1 ]
Agrawal, Aishwarya [1 ]
Lu, Jiasen [1 ]
Mitchell, Margaret [2 ]
Batra, Dhruv [1 ]
Zitnick, C. Lawrence [2 ]
Parikh, Devi [1 ]
机构
[1] Virginia Tech, Blacksburg, VA 24061 USA
[2] Microsoft Res, Cambridge, MA USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/ICCV.2015.279
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing similar to 0.25M images, similar to 0.76M questions, and similar to 10M answers (www.visualqa.org), and discuss the information it provides. Numerous baselines for VQA are provided and compared with human performance.
引用
收藏
页码:2425 / 2433
页数:9
相关论文
共 50 条
  • [41] Robust Explanations for Visual Question Answering
    Patro, Badri N.
    Patel, Shivansh
    Namboodiri, Vinay P.
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1566 - 1575
  • [42] Question action relevance and editing for visual question answering
    Andeep S. Toor
    Harry Wechsler
    Michele Nappi
    [J]. Multimedia Tools and Applications, 2019, 78 : 2921 - 2935
  • [43] Question -Led object attention for visual question answering
    Gao, Lianli
    Cao, Liangfu
    Xu, Xing
    Shao, Jie
    Song, Jingkuan
    [J]. NEUROCOMPUTING, 2020, 391 : 227 - 233
  • [44] Question-Agnostic Attention for Visual Question Answering
    Farazi, Moshiur
    Khan, Salman
    Barnes, Nick
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3542 - 3549
  • [45] Chain of Reasoning for Visual Question Answering
    Wu, Chenfei
    Liu, Jinlai
    Wang, Xiaojie
    Dong, Xuan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [46] Visual Question Answering on 360° Images
    Chou, Shih-Han
    Chao, Wei-Lun
    Lai, Wei-Sheng
    Sun, Min
    Yang, Ming-Hsuan
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1596 - 1605
  • [47] Medical visual question answering: A survey
    Lin, Zhihong
    Zhang, Donghao
    Tao, Qingyi
    Shi, Danli
    Haffari, Gholamreza
    Wu, Qi
    He, Mingguang
    Ge, Zongyuan
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 143
  • [48] Multi-Question Learning for Visual Question Answering
    Lei, Chenyi
    Wu, Lei
    Liu, Dong
    Li, Zhao
    Wang, Guoxin
    Tang, Haihong
    Li, Houqiang
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11328 - 11335
  • [49] Question Type Guided Attention in Visual Question Answering
    Shi, Yang
    Furlanello, Tommaso
    Zha, Sheng
    Anandkumar, Animashree
    [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175
  • [50] Question action relevance and editing for visual question answering
    Toor, Andeep S.
    Wechsler, Harry
    Nappi, Michele
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 2921 - 2935