Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection

被引:5
|
作者
Garcia-Olano, Diego [1 ]
Onoe, Yasumasa [1 ]
Ghosh, Joydeep [1 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
关键词
visual question answering; knowledge injection; entity learning; multi-modal learning; explainability; weak supervision;
D O I
10.1145/3487553.3524648
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge-Based Visual Question Answering (KBVQA) is a bimodal task requiring external world knowledge in order to correctly answer a text question and associated image. Recent single modality text work has shown knowledge injection into pre-trained language models, specifically entity enhanced knowledge graph embeddings, can improve performance on downstream entity-centric tasks. In this work, we empirically study how and whether such methods, applied in a bi-modal setting, can improve an existing VQA system's performance on the KBVQA task. We experiment with two large publicly available VQA datasets, (1) KVQA which contains mostly rare Wikipedia entities and (2) OKVQA which is less entity-centric and more aligned with common sense reasoning. Both lack explicit entity spans, and we study the effect of different weakly supervised and manual methods for obtaining them. Additionally, we analyze how recently proposed bi-modal and single modal attention explanations are affected by the incorporation of such entity enhanced representations. Our results show substantially improved performance on the KBVQA task without the need for additional costly pre-training, and we provide insights for when entity knowledge injection helps improve a model's understanding. We provide code and enhanced datasets for reproducibility1.
引用
收藏
页码:705 / 715
页数:11
相关论文
共 50 条
  • [1] Explainable Knowledge reasoning via thought chains for knowledge-based visual question answering
    Qiu, Chen
    Xie, Zhiqiang
    Liu, Maofu
    Hu, Huijun
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (04)
  • [2] Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
    Hu, Zhongjian
    Yang, Peng
    Liu, Fengyuan
    Meng, Yuan
    Liu, Xingyu
    [J]. BIG DATA MINING AND ANALYTICS, 2024, 7 (03): : 843 - 857
  • [3] Answering knowledge-based visual questions via the exploration of Question Purpose
    Song, Lingyun
    Li, Jianao
    Liu, Jun
    Yang, Yang
    Shang, Xuequn
    Sun, Mingxuan
    [J]. PATTERN RECOGNITION, 2023, 133
  • [4] A Retriever-Reader Framework with Visual Entity Linking for Knowledge-Based Visual Question Answering
    You, Jiuxiang
    Yang, Zhenguo
    Li, Qing
    Liu, Wenyin
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 13 - 18
  • [5] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Zhenqiang Su
    Gang Gou
    [J]. Knowledge and Information Systems, 2024, 66 : 2193 - 2208
  • [6] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Su, Zhenqiang
    Gou, Gang
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (03) : 2193 - 2208
  • [7] Explicit Knowledge-based Reasoning for Visual Question Answering
    Wang, Peng
    Wu, Qi
    Shen, Chunhua
    Dick, Anthony
    van den Hengel, Anton
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1290 - 1296
  • [8] Knowledge-based question answering
    Rinaldi, F
    Dowdall, J
    Hess, M
    Mollá, D
    Schwitter, R
    Kaljurand, K
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2003, 2773 : 785 - 792
  • [9] Knowledge-based question answering
    Hermjakob, U
    Hovy, EH
    Lin, CY
    [J]. 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XVI, PROCEEDINGS: COMPUTER SCIENCE III, 2002, : 66 - 71
  • [10] Cross-modal knowledge reasoning for knowledge-based visual question answering
    Yu, Jing
    Zhu, Zihao
    Wang, Yujing
    Zhang, Weifeng
    Hu, Yue
    Tan, Jianlong
    [J]. PATTERN RECOGNITION, 2020, 108