A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge

被引:24
|
作者
Schwenk, Dustin [1 ]
Khandelwal, Apoorv [1 ]
Clark, Christopher [1 ]
Marino, Kenneth [2 ]
Mottaghi, Roozbeh [1 ]
机构
[1] PRIOR Allen Inst AI, Seattle, WA 98103 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA USA
来源
关键词
D O I
10.1007/978-3-031-20074-8_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Visual Question Answering (VQA) task aspires to provide a meaningful testbed for the development of AI models that can jointly reason over visual and natural language inputs. Despite a proliferation of VQA datasets, this goal is hindered by a set of common limitations. These include a reliance on relatively simplistic questions that are repetitive in both concepts and linguistic structure, little world knowledge needed outside of the paired image, and limited reasoning required to arrive at the correct answer. We introduce A-OKVQA, a crowdsourced dataset composed of a diverse set of about 25K questions requiring a broad base of commonsense and world knowledge to answer. In contrast to existing knowledge-based VQA datasets, the questions generally cannot be answered by simply querying a knowledge base, and instead require some form of commonsense reasoning about the scene depicted in the image. We demonstrate the potential of this new dataset through a detailed analysis of its contents and baseline performance measurements over a variety of state-of-the-art vision-language models.
引用
收藏
页码:146 / 162
页数:17
相关论文
共 50 条
  • [31] Passage Retrieval for Outside-Knowledge Visual Question Answering
    Qu, Chen
    Zamani, Hamed
    Yang, Liu
    Croft, W. Bruce
    Learned-Miller, Erik
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1753 - 1757
  • [32] VQA as a factoid question answering problem: A novel approach for knowledge-aware and explainable visual question answering
    Narayanan, Abhishek
    Rao, Abijna
    Prasad, Abhishek
    Natarajan, S.
    [J]. IMAGE AND VISION COMPUTING, 2021, 116
  • [33] Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering
    Zhang, Liyang
    Liu, Shuaicheng
    Liu, Donghao
    Zeng, Pengpeng
    Li, Xiangpeng
    Song, Jingkuan
    Gao, Lianli
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (10) : 4362 - 4373
  • [34] VisKE: Visual Knowledge Extraction and Question Answering by Visual Verification of Relation Phrases
    Sadeghi, Fereshteh
    Divvala, Santosh K.
    Farhad, Ali
    [J]. 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 1456 - 1464
  • [35] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Su, Zhenqiang
    Gou, Gang
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (03) : 2193 - 2208
  • [36] VisKoP: Visual Knowledge oriented Programming for Interactive Knowledge Base Question Answering
    Yao, Zijun
    Chen, Yuanyong
    Lv, Xin
    Cao, Shulin
    Xin, Amy
    Yu, Jifan
    Jin, Hailong
    Xu, Jianjun
    Zhang, Peng
    Hou, Lei
    Li, Juanzi
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-DEMO 2023, VOL 3, 2023, : 179 - 189
  • [37] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Zhenqiang Su
    Gang Gou
    [J]. Knowledge and Information Systems, 2024, 66 : 2193 - 2208
  • [38] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
  • [39] Event-Oriented Visual Question Answering: The E-VQA Dataset and Benchmark
    Yang, Zhenguo
    Xiang, Jiale
    You, Jiuxiang
    Li, Qing
    Liu, Wenyin
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (10) : 10210 - 10223
  • [40] Improving visual question answering using dropout and enhanced question encoder
    Fang, Zhiwei
    Liu, Jing
    Li, Yong
    Qiao, Yanyuan
    Lu, Hanqing
    [J]. PATTERN RECOGNITION, 2019, 90 : 404 - 414