BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining

被引:0
|
作者
Kim, MinJun [1 ]
Song, SeungWoo [1 ]
Lee, YouHan [2 ]
Jang, Haneol [1 ]
Lim, KyungTae [3 ]
机构
[1] Hanbat Natl Univ, Daejeon, South Korea
[2] Kakao Brain, Seongnam, South Korea
[3] Seoul Natl Univ Sci & Technol, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current research direction in generative models, such as the recently developed GPT4, aims to find relevant knowledge information for multimodal and multilingual inputs to provide answers. Under these research circumstances, the demand for multilingual evaluation of visual question answering (VQA) tasks, a representative task of multimodal systems, has increased. Accordingly, we propose a bilingual outside-knowledge VQA (BOK-VQA) dataset in this study that can be extended to multilingualism. The proposed data include 17K images, 17K question-answer pairs for both Korean and English and 280K instances of knowledge information related to question-answer content. We also present a framework that can effectively inject knowledge information into a VQA system by pretraining the knowledge information of BOK-VQA data in the form of graph embeddings. Finally, through in-depth analysis, we demonstrated the actual effect of the knowledge information contained in the constructed training data on VQA.
引用
收藏
页码:18381 / 18389
页数:9
相关论文
共 38 条
  • [1] VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering
    Wang, Yanan
    Yasunaga, Michihiro
    Ren, Hongyu
    Wada, Shinya
    Leskovec, Jure
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21525 - 21535
  • [2] Answering knowledge-based visual questions via the exploration of Question Purpose
    Song, Lingyun
    Li, Jianao
    Liu, Jun
    Yang, Yang
    Shang, Xuequn
    Sun, Mingxuan
    [J]. PATTERN RECOGNITION, 2023, 133
  • [3] Explainable Knowledge reasoning via thought chains for knowledge-based visual question answering
    Qiu, Chen
    Xie, Zhiqiang
    Liu, Maofu
    Hu, Huijun
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (04)
  • [4] Explicit Knowledge-based Reasoning for Visual Question Answering
    Wang, Peng
    Wu, Qi
    Shen, Chunhua
    Dick, Anthony
    van den Hengel, Anton
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1290 - 1296
  • [5] Knowledge-Based Visual Question Answering Using Multi-Modal Semantic Graph
    Jiang, Lei
    Meng, Zuqiang
    [J]. ELECTRONICS, 2023, 12 (06)
  • [6] The Core of Smart Cities: Knowledge Representation and Descriptive Framework Construction in Knowledge-Based Visual Question Answering
    Wang, Ruiping
    Wu, Shihong
    Wang, Xiaoping
    [J]. SUSTAINABILITY, 2022, 14 (20)
  • [7] Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection
    Garcia-Olano, Diego
    Onoe, Yasumasa
    Ghosh, Joydeep
    [J]. COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 705 - 715
  • [8] Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering
    Zhang, Liyang
    Liu, Shuaicheng
    Liu, Donghao
    Zeng, Pengpeng
    Li, Xiangpeng
    Song, Jingkuan
    Gao, Lianli
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (10) : 4362 - 4373
  • [9] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Zhenqiang Su
    Gang Gou
    [J]. Knowledge and Information Systems, 2024, 66 : 2193 - 2208
  • [10] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Su, Zhenqiang
    Gou, Gang
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (03) : 2193 - 2208