VQA: Visual Question Answering

被引:2360
|
作者
Antol, Stanislaw [1 ]
Agrawal, Aishwarya [1 ]
Lu, Jiasen [1 ]
Mitchell, Margaret [2 ]
Batra, Dhruv [1 ]
Zitnick, C. Lawrence [2 ]
Parikh, Devi [1 ]
机构
[1] Virginia Tech, Blacksburg, VA 24061 USA
[2] Microsoft Res, Cambridge, MA USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/ICCV.2015.279
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing similar to 0.25M images, similar to 0.76M questions, and similar to 10M answers (www.visualqa.org), and discuss the information it provides. Numerous baselines for VQA are provided and compared with human performance.
引用
收藏
页码:2425 / 2433
页数:9
相关论文
共 50 条
  • [21] From image to language: A critical analysis of Visual Question Answering (VQA) approaches, challenges, and opportunities
    Ishmam, Md. Farhan
    Shovon, Md. Sakib Hossain
    Mridha, M. F.
    Dey, Nilanjan
    [J]. INFORMATION FUSION, 2024, 106
  • [22] Feasibility of Visual Question Answering (VQA) for Post-Disaster Damage Detection Using Aerial Footage
    Lowande, Rafael De Sa
    Sevil, Hakki Erhan
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [23] Co-VQA : Answering by Interactive Sub Question Sequence
    Wang, Ruonan
    Qian, Yuxi
    Feng, Fangxiang
    Wang, Xiaojie
    Jiang, Huixing
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2396 - 2408
  • [24] TG-VQA: Ternary Game of Video Question Answering
    Li, Hao
    Jin, Peng
    Cheng, Zesen
    Zhang, Songyang
    Chen, Kai
    Wang, Zhennan
    Liu, Chang
    Chen, Jie
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1044 - 1052
  • [25] Fair-VQA: Fairness-Aware Visual Question Answering Through Sensitive Attribute Prediction
    Park, Sungho
    Hwang, Sunhee
    Hong, Jongkwang
    Byun, Hyeran
    [J]. IEEE ACCESS, 2020, 8 : 215091 - 215099
  • [26] A CASCADED LONG SHORT-TERM MEMORY (LSTM) DRIVEN GENERIC VISUAL QUESTION ANSWERING (VQA)
    Chowdhury, Iqbal
    Kien Nguyen
    Fookes, Clinton
    Sridharan, Sridha
    [J]. 2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 1842 - 1846
  • [27] VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering
    Wang, Yanan
    Yasunaga, Michihiro
    Ren, Hongyu
    Wada, Shinya
    Leskovec, Jure
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21525 - 21535
  • [28] Post-Disaster Damage Detection using Aerial Footage: Visual Question Answering (VQA) Case Study
    Lowande, Rafael De Sa
    Mahyari, Arash
    Sevil, Hakki Erhan
    [J]. 2022 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP, AIPR, 2022,
  • [29] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
  • [30] FTN-VQA: MULTIMODAL REASONING BY LEVERAGING A FULLY TRANSFORMER-BASED NETWORK FOR VISUAL QUESTION ANSWERING
    Wang, Runmin
    Xu, Weixiang
    Zhu, Yanbin
    Zhu, Zhenlin
    Chen, Hua
    Ding, Yajun
    Liu, Jinping
    Gao, Changxin
    Sang, Nong
    [J]. FRACTALS-COMPLEX GEOMETRY PATTERNS AND SCALING IN NATURE AND SOCIETY, 2023, 31 (06)