A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge

被引:24
|
作者
Schwenk, Dustin [1 ]
Khandelwal, Apoorv [1 ]
Clark, Christopher [1 ]
Marino, Kenneth [2 ]
Mottaghi, Roozbeh [1 ]
机构
[1] PRIOR Allen Inst AI, Seattle, WA 98103 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA USA
来源
关键词
D O I
10.1007/978-3-031-20074-8_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Visual Question Answering (VQA) task aspires to provide a meaningful testbed for the development of AI models that can jointly reason over visual and natural language inputs. Despite a proliferation of VQA datasets, this goal is hindered by a set of common limitations. These include a reliance on relatively simplistic questions that are repetitive in both concepts and linguistic structure, little world knowledge needed outside of the paired image, and limited reasoning required to arrive at the correct answer. We introduce A-OKVQA, a crowdsourced dataset composed of a diverse set of about 25K questions requiring a broad base of commonsense and world knowledge to answer. In contrast to existing knowledge-based VQA datasets, the questions generally cannot be answered by simply querying a knowledge base, and instead require some form of commonsense reasoning about the scene depicted in the image. We demonstrate the potential of this new dataset through a detailed analysis of its contents and baseline performance measurements over a variety of state-of-the-art vision-language models.
引用
收藏
页码:146 / 162
页数:17
相关论文
共 50 条
  • [1] OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
    Marino, Kenneth
    Rastegari, Mohammad
    Farhadi, Ali
    Mottaghi, Roozbeh
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3190 - 3199
  • [2] Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering
    Jain, Aman
    Kothyari, Mayank
    Kumar, Vishwajeet
    Jyothi, Preethi
    Ramakrishnan, Ganesh
    Chakrabarti, Soumen
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2491 - 2498
  • [3] The SciQA Scientific Question Answering Benchmark for Scholarly Knowledge
    Sören Auer
    Dante A. C. Barone
    Cassiano Bartz
    Eduardo G. Cortes
    Mohamad Yaser Jaradeh
    Oliver Karras
    Manolis Koubarakis
    Dmitry Mouromtsev
    Dmitrii Pliukhin
    Daniil Radyush
    Ivan Shilin
    Markus Stocker
    Eleni Tsalapati
    [J]. Scientific Reports, 13
  • [4] The SciQA Scientific Question Answering Benchmark for Scholarly Knowledge
    Auer, Soeren
    Barone, Dante A. C.
    Bartz, Cassiano
    Cortes, Eduardo G.
    Jaradeh, Mohamad Yaser
    Karras, Oliver
    Koubarakis, Manolis
    Mouromtsev, Dmitry
    Pliukhin, Dmitrii
    Radyush, Daniil
    Shilin, Ivan
    Stocker, Markus
    Tsalapati, Eleni
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [5] Towards Video Text Visual Question Answering: Benchmark and Baseline
    Zhao, Minyi
    Li, Bingjia
    Wang, Jie
    Li, Wanqing
    Zhou, Wenjing
    Zhang, Lan
    Xuyang, Shijie
    Yu, Zhihang
    Yu, Xinkun
    Li, Guangze
    Dai, Aobotao
    Zhou, Shuigeng
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [6] Zero-Shot Visual Question Answering Using Knowledge Graph
    Chen, Zhuo
    Chen, Jiaoyan
    Geng, Yuxia
    Pan, Jeff Z.
    Yuan, Zonggang
    Chen, Huajun
    [J]. SEMANTIC WEB - ISWC 2021, 2021, 12922 : 146 - 162
  • [7] Learning Visual Knowledge Memory Networks for Visual Question Answering
    Su, Zhou
    Zhu, Chen
    Dong, Yinpeng
    Cai, Dongqi
    Chen, Yurong
    Li, Jianguo
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7736 - 7745
  • [8] ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
    Masry, Ahmed
    Long, Do Xuan
    Tan, Jia Qing
    Joty, Shafiq
    Hogue, Enamul
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2263 - 2279
  • [9] Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool
    Liu, Feng
    Xiang, Tao
    Hospedales, Timothy M.
    Yang, Wankou
    Sun, Changyin
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) : 460 - 474
  • [10] Regulating Balance Degree for More Reasonable Visual Question Answering Benchmark
    Lin, Ken
    Mao, Aihua
    Liu, Jiangfeng
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,