Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering

被引:56
|
作者
Shao, Zhenwei [1 ]
Yu, Zhou [1 ]
Wang, Meng [2 ]
Yu, Jun [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Key Lab Complex Syst Modeling & Simulat, Hangzhou, Peoples R China
[2] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China
基金
国家重点研发计划;
关键词
D O I
10.1109/CVPR52729.2023.01438
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge-based visual question answering (VQA) requires external knowledge beyond the image to answer the question. Early studies retrieve required knowledge from explicit knowledge bases (KBs), which often introduces irrelevant information to the question, hence restricting the performance of their models. Recent works have sought to use a large language model (i.e., GPT-3 [3]) as an implicit knowledge engine to acquire the necessary knowledge for answering. Despite the encouraging results achieved by these methods, we argue that they have not fully activated the capacity of GPT-3 as the provided input information is insufficient. In this paper, we present Prophet-a conceptually simple framework designed to prompt GPT-3 with answer heuristics for knowledge-based VQA. Specifically, we first train a vanilla VQA model on a specific knowledge-based VQA dataset without external knowledge. After that, we extract two types of complementary answer heuristics from the model: answer candidates and answer-aware examples. Finally, the two types of answer heuristics are encoded into the prompts to enable GPT-3 to better comprehend the task thus enhancing its capacity. Prophet significantly outperforms all existing state-of-the-art methods on two challenging knowledge-based VQA datasets, OK-VQA and A-OKVQA, delivering 61.1% and 55.7% accuracies on their testing sets, respectively.
引用
收藏
页码:14974 / 14983
页数:10
相关论文
共 50 条
  • [21] Caption matters: a new perspective for knowledge-based visual question answering
    Feng, Bin
    Ruan, Shulan
    Wu, Likang
    Liu, Huijie
    Zhang, Kai
    Zhang, Kun
    Liu, Qi
    Chen, Enhong
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (11) : 6975 - 7003
  • [22] IIU: Independent Inference Units for Knowledge-Based Visual Question Answering
    Li, Yili
    Yu, Jing
    Gai, Keke
    Xiong, Gang
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2024, 2024, 14887 : 109 - 120
  • [23] Efficient Question Answering Based on Language Models and Knowledge Graphs
    Li, Fengying
    Huang, Hongfei
    Dong, Rongsheng
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IV, 2023, 14257 : 340 - 351
  • [24] Explainable Knowledge reasoning via thought chains for knowledge-based visual question answering
    Qiu, Chen
    Xie, Zhiqiang
    Liu, Maofu
    Hu, Huijun
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (04)
  • [25] KG-CoT: Chain-of-Thought Prompting of Large Language Models over Knowledge Graphs for Knowledge-Aware Question Answering
    Zhao, Ruilin
    Zhao, Feng
    Wang, Long
    Wang, Xianzhi
    Xu, Guandong
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 6642 - 6650
  • [26] Knowledge-Based Question and Answering System for Turkish
    Yasar, Pinar
    Sahin, Irem
    Adali, Esref
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 307 - 312
  • [27] Knowledge-Based Question Answering as Machine Translation
    Bao, Junwei
    Duan, Nan
    Zhou, Ming
    Zhao, Tiejun
    PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2014, : 967 - 976
  • [28] Medical knowledge-based network for Patient-oriented Visual Question Answering
    Jian, Huang
    Chen, Yihao
    Yong, Li
    Yang, Zhenguo
    Gong, Xuehao
    Lee, Wang Fu
    Xu, Xiaohong
    Liu, Wenyin
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
  • [29] ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities
    Lerner, Paul
    Ferret, Olivier
    Guinaudeau, Camille
    Le Borgne, Herve
    Besancon, Romaric
    Moreno, Jose G.
    Melgarejo, Jesus Lovon
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 3108 - 3120
  • [30] FacGPT: An Effective and Efficient Method for Evaluating Knowledge-Based Visual Question Answering
    Cheng, Sirui
    Zhang, Siyu
    Wu, Jiayi
    Lan, Muchen
    Sun, Yaoru
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, NLPCC 2024, 2025, 15359 : 201 - 214