BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining

被引：0

作者：

Kim, MinJun ^{[1
]}

Song, SeungWoo ^{[1
]}

Lee, YouHan ^{[2
]}

Jang, Haneol ^{[1
]}

Lim, KyungTae ^{[3
]}

机构：

[1] Hanbat Natl Univ, Daejeon, South Korea

[2] Kakao Brain, Seongnam, South Korea

[3] Seoul Natl Univ Sci & Technol, Seoul, South Korea

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16 | 2024年

基金：

新加坡国家研究基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The current research direction in generative models, such as the recently developed GPT4, aims to find relevant knowledge information for multimodal and multilingual inputs to provide answers. Under these research circumstances, the demand for multilingual evaluation of visual question answering (VQA) tasks, a representative task of multimodal systems, has increased. Accordingly, we propose a bilingual outside-knowledge VQA (BOK-VQA) dataset in this study that can be extended to multilingualism. The proposed data include 17K images, 17K question-answer pairs for both Korean and English and 280K instances of knowledge information related to question-answer content. We also present a framework that can effectively inject knowledge information into a VQA system by pretraining the knowledge information of BOK-VQA data in the form of graph embeddings. Finally, through in-depth analysis, we demonstrated the actual effect of the knowledge information contained in the constructed training data on VQA.

引用

页码：18381 / 18389

页数：9

共 38 条

[21] A Retriever-Reader Framework with Visual Entity Linking for Knowledge-Based Visual Question Answering
You, Jiuxiang
Yang, Zhenguo
Li, Qing
Liu, Wenyin
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 13 - 18
[22] Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
Hu, Zhongjian
Yang, Peng
Liu, Fengyuan
Meng, Yuan
Liu, Xingyu
[J]. BIG DATA MINING AND ANALYTICS, 2024, 7 (03): : 843 - 857
[23] MKEAH： Multimodal knowledge extraction and accumulation based on hyperplane embedding for knowledge-based visual question answering
Zhang, Heng
Wei, Zhihua
Liu, Guanming
Wang, Rui
Mu, Ruibin
Liu, Chuanbao
Yuan, Aiquan
Cao, Guodong
Hu, Ning
[J]. Virtual Reality and Intelligent Hardware, 2024, 6 (04): : 280 - 291
[24] MKEAH: Multimodal knowledge extraction and accumulation based on hyperplane embedding for knowledge-based visual question answering
Heng ZHANG
Zhihua WEI
Guanming LIU
Rui WANG
Ruibin MU
Chuanbao LIU
Aiquan YUAN
Guodong CAO
Ning HU
[J]. 虚拟现实与智能硬件(中英文)., 2024, 6 (04) - 291
[25] Let Me Show You Step by Step: An Interpretable Graph Routing Network for Knowledge-based Visual Question Answering
Wang, Duokang
Hu, Linmei
Hao, Rui
Shao, Yingxia
Lv, Xin
Nie, Liqiang
Li, Juanzi
[J]. PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 1984 - 1994
[26] Cross-modality Multiple Relations Learning for Knowledge-based Visual Question Answering
Wang, Yan
Li, Peize
Si, Qingyi
Zhang, Hanwen
Zang, Wenyu
Lin, Zheng
Fu, Peng
[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (03)
[27] Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering
Luo, Man
Zeng, Yankai
Banerjee, Pratyay
Baral, Chitta
[J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6417 - 6431
[28] Inner Knowledge-based Img2Doc Scheme for Visual Question Answering
Li, Qun
Xiao, Fu
Bhanu, Bir
Sheng, Biyun
Hong, Richang
[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (03)
[29] Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering
Shao, Zhenwei
Yu, Zhou
Wang, Meng
Yu, Jun
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14974 - 14983
[30] Image captioning for effective use of language models in knowledge-based visual question answering
Salaberria, Ander
Azkune, Gorka
Lacalle, Oier Lopez de
Soroa, Aitor
Agirre, Eneko
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 212

← 1 2 3 4 →