Exploiting Query Knowledge Embedding and Trilinear Joint Embedding for Visual Question Answering

被引：0

作者：

Chen, Zheng ^{[1
]}

Wen, Yaxin ^{[1
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Informat & Software Engn, Chengdu 611731, Sichuan, Peoples R China

来源：

ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV | 2023年 / 14089卷

关键词：

Visual question answering; Attention mechanism; Knowledge base; Joint embedding;

D O I：

10.1007/978-981-99-4752-2_64

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual Question Answering (VQA) aims to answer natural language questions about a given image. Researchers generally believe that incorporating external knowledge can improve VQA task's performance. However, existing methods face limitations in acquiring and utilizing such knowledge, preventing them from effectively enhancing a model's question-answering capability. In this paper, we propose a novel VQA approach based on question-query for Knowledge Embedding. In our approach, we design question query rules to obtain critical external knowledge and then embed this knowledge by integrating it with the question as input features for text modalities. Traditional multimodal feature fusion techniques rely solely on local features, which may result in the loss of global information. To address this issue, we introduce a feature fusion method based on Trilinear Joint Embedding. Utilizing an attention mechanism, we generate a feature matrix composed of question, knowledge, and image components. This matrix is then trilinearly joint embedded to form a novel global feature vector. Due to the computational challenges associated with high-dimensional vectors produced during the trilinear joint embedding process, we employ Tensor Decomposition to break down this vector into a sum of several low-rank tensors. Subsequently, we input the global feature vector into a classifier to obtain the answer in a multicategory classification fashion. Experimental results on the VQAv2, OKVQA, and VizWiz public datasets demonstrate that our approach can achieve accuracy improvements of 1.78%, 3.95%, and 1.16%. Our code are available at https://git hub.com/yxNoth/KB-VLT.

引用

页码：780 / 791

页数：12

共 50 条

[21] Multi-hop Question Answering with Knowledge Graph Embedding in a Similar Semantic Space
Li, Fengying
Chen, Mingdong
Dong, Rongsheng
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[22] Question Answering over Knowledgebase with Attention-based LSTM Networks and Knowledge Embedding
Chen, Lin
Zeng, Guanping
Zhang, Qingchuan
Chen, Xingyu
Wu, Danfeng
2017 IEEE 16TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2017, : 243 - 246
[23] User Embedding for Expert Finding in Community Question Answering
Ghasemi, Negin
Fatourechi, Ramin
Momtazi, Saeedeh
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2021, 15 (04)
[24] Exploiting hierarchical visual features for visual question answering
Hong, Jongkwang
Fu, Jianlong
Uh, Youngjung
Mei, Tao
Byun, Hyeran
NEUROCOMPUTING, 2019, 351 : 187 - 195
[25] Trilinear Distillation Learning and Question Feature Capturing for Medical Visual Question Answering
Long, Shaopei
Li, Yong
Weng, Heng
Tang, Buzhou
Wang, Fu Lee
Hao, Tianyong
NEURAL COMPUTING FOR ADVANCED APPLICATIONS, NCAA 2024, PT III, 2025, 2183 : 162 - 177
[26] Quantum Language Model With Entanglement Embedding for Question Answering
Chen, Yiwei
Pan, Yu
Dong, Daoyi
IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (06) : 3467 - 3478
[27] BERT with History Answer Embedding for Conversational Question Answering
Qu, Chen
Yang, Liu
Qiu, Minghui
Croft, W. Bruce
Zhang, Yongfeng
Iyyer, Mohit
PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1133 - 1136
[28] Query Answering to IQ Test Questions Using Word Embedding
Frackowiak, Michal
Dutkiewicz, Jakub
Jedrzejek, Czeslaw
Retinger, Marek
Werda, Pawel
MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, MISSI 2016, 2017, 506 : 283 - 294
[29] Parallel multi-head attention and term-weighted question embedding for medical visual question answering
Manmadhan, Sruthy
Kovoor, Binsu C.
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (22) : 34937 - 34958
[30] Parallel multi-head attention and term-weighted question embedding for medical visual question answering
Sruthy Manmadhan
Binsu C Kovoor
Multimedia Tools and Applications, 2023, 82 : 34937 - 34958

← 1 2 3 4 5 →