Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering

被引:30
|
作者
Shao, Zhenwei [1 ]
Yu, Zhou [1 ]
Wang, Meng [2 ]
Yu, Jun [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Key Lab Complex Syst Modeling & Simulat, Hangzhou, Peoples R China
[2] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China
基金
国家重点研发计划;
关键词
D O I
10.1109/CVPR52729.2023.01438
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge-based visual question answering (VQA) requires external knowledge beyond the image to answer the question. Early studies retrieve required knowledge from explicit knowledge bases (KBs), which often introduces irrelevant information to the question, hence restricting the performance of their models. Recent works have sought to use a large language model (i.e., GPT-3 [3]) as an implicit knowledge engine to acquire the necessary knowledge for answering. Despite the encouraging results achieved by these methods, we argue that they have not fully activated the capacity of GPT-3 as the provided input information is insufficient. In this paper, we present Prophet-a conceptually simple framework designed to prompt GPT-3 with answer heuristics for knowledge-based VQA. Specifically, we first train a vanilla VQA model on a specific knowledge-based VQA dataset without external knowledge. After that, we extract two types of complementary answer heuristics from the model: answer candidates and answer-aware examples. Finally, the two types of answer heuristics are encoded into the prompts to enable GPT-3 to better comprehend the task thus enhancing its capacity. Prophet significantly outperforms all existing state-of-the-art methods on two challenging knowledge-based VQA datasets, OK-VQA and A-OKVQA, delivering 61.1% and 55.7% accuracies on their testing sets, respectively.
引用
收藏
页码:14974 / 14983
页数:10
相关论文
共 50 条
  • [1] Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
    Hu, Zhongjian
    Yang, Peng
    Liu, Fengyuan
    Meng, Yuan
    Liu, Xingyu
    [J]. BIG DATA MINING AND ANALYTICS, 2024, 7 (03): : 843 - 857
  • [2] Image captioning for effective use of language models in knowledge-based visual question answering
    Salaberria, Ander
    Azkune, Gorka
    Lacalle, Oier Lopez de
    Soroa, Aitor
    Agirre, Eneko
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 212
  • [3] Prompting large language model with context and pre-answer for knowledge-based VQA
    Hu, Zhongjian
    Yang, Peng
    Jiang, Yuanshuang
    Bai, Zijian
    [J]. PATTERN RECOGNITION, 2024, 151
  • [4] Explicit Knowledge-based Reasoning for Visual Question Answering
    Wang, Peng
    Wu, Qi
    Shen, Chunhua
    Dick, Anthony
    van den Hengel, Anton
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1290 - 1296
  • [5] Knowledge-based question answering
    Rinaldi, F
    Dowdall, J
    Hess, M
    Mollá, D
    Schwitter, R
    Kaljurand, K
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2003, 2773 : 785 - 792
  • [6] Knowledge-based question answering
    Hermjakob, U
    Hovy, EH
    Lin, CY
    [J]. 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XVI, PROCEEDINGS: COMPUTER SCIENCE III, 2002, : 66 - 71
  • [7] Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering
    Zhang, Liyang
    Liu, Shuaicheng
    Liu, Donghao
    Zeng, Pengpeng
    Li, Xiangpeng
    Song, Jingkuan
    Gao, Lianli
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (10) : 4362 - 4373
  • [8] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Zhenqiang Su
    Gang Gou
    [J]. Knowledge and Information Systems, 2024, 66 : 2193 - 2208
  • [9] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Su, Zhenqiang
    Gou, Gang
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (03) : 2193 - 2208
  • [10] Cross-modal knowledge reasoning for knowledge-based visual question answering
    Yu, Jing
    Zhu, Zihao
    Wang, Yujing
    Zhang, Weifeng
    Hu, Yue
    Tan, Jianlong
    [J]. PATTERN RECOGNITION, 2020, 108