Exploiting Query Knowledge Embedding and Trilinear Joint Embedding for Visual Question Answering

被引：0

作者：

Chen, Zheng ^{[1
]}

Wen, Yaxin ^{[1
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Informat & Software Engn, Chengdu 611731, Sichuan, Peoples R China

来源：

ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV | 2023年 / 14089卷

关键词：

Visual question answering; Attention mechanism; Knowledge base; Joint embedding;

D O I：

10.1007/978-981-99-4752-2_64

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual Question Answering (VQA) aims to answer natural language questions about a given image. Researchers generally believe that incorporating external knowledge can improve VQA task's performance. However, existing methods face limitations in acquiring and utilizing such knowledge, preventing them from effectively enhancing a model's question-answering capability. In this paper, we propose a novel VQA approach based on question-query for Knowledge Embedding. In our approach, we design question query rules to obtain critical external knowledge and then embed this knowledge by integrating it with the question as input features for text modalities. Traditional multimodal feature fusion techniques rely solely on local features, which may result in the loss of global information. To address this issue, we introduce a feature fusion method based on Trilinear Joint Embedding. Utilizing an attention mechanism, we generate a feature matrix composed of question, knowledge, and image components. This matrix is then trilinearly joint embedded to form a novel global feature vector. Due to the computational challenges associated with high-dimensional vectors produced during the trilinear joint embedding process, we employ Tensor Decomposition to break down this vector into a sum of several low-rank tensors. Subsequently, we input the global feature vector into a classifier to obtain the answer in a multicategory classification fashion. Experimental results on the VQAv2, OKVQA, and VizWiz public datasets demonstrate that our approach can achieve accuracy improvements of 1.78%, 3.95%, and 1.16%. Our code are available at https://git hub.com/yxNoth/KB-VLT.

引用

页码：780 / 791

页数：12

共 50 条

[1] Exploiting Sentence Embedding for Medical Question Answering
Hao, Yu
Liu, Xien
Wu, Ji
Lv, Ping
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 938 - 945
[2] Knowledge Graph Embedding Based Question Answering
Huang, Xiao
Zhang, Jingyuan
Li, Dingcheng
Li, Ping
PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 105 - 113
[3] An Enhanced Term Weighted Question Embedding for Visual Question Answering
Manmadhan, Sruthy
Kovoor, Binsu C.
JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2022, 21 (02)
[4] Multi visual and textual embedding on visual question answering for blind people
Tung Le
Huy Tien Nguyen
Minh Le Nguyen
NEUROCOMPUTING, 2021, 465 : 451 - 464
[5] Embedding Spatial Relations in Visual Question Answering for Remote Sensing
Faure, Maxime
Lobry, Sylvain
Kurtz, Camille
Wendling, Laurent
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 310 - 316
[6] MKEAH: Multimodal knowledge extraction and accumulation based on hyperplane embedding for knowledge-based visual question answering
Heng ZHANG
Zhihua WEI
Guanming LIU
Rui WANG
Ruibin MU
Chuanbao LIU
Aiquan YUAN
Guodong CAO
Ning HU
虚拟现实与智能硬件(中英文), 2024, 6 (04) : 280 - 291
[7] MKEAH： Multimodal knowledge extraction and accumulation based on hyperplane embedding for knowledge-based visual question answering
Zhang, Heng
Wei, Zhihua
Liu, Guanming
Wang, Rui
Mu, Ruibin
Liu, Chuanbao
Yuan, Aiquan
Cao, Guodong
Hu, Ning
Virtual Reality and Intelligent Hardware, 6 (04): : 280 - 291
[8] Compact Trilinear Interaction for Visual Question Answering
Tuong Do
Thanh-Toan Do
Huy Tran
Tjiputra, Erman
Tran, Quang D.
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 392 - 401
[9] Knowledge-based question answering using the semantic embedding space
Yang, Min-Chul
Lee, Do-Gil
Park, So-Young
Rim, Hae-Chang
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (23) : 9086 - 9104
[10] A Question Embedding-based Method to Enrich Features for Knowledge Base Question Answering
Wang, Xin
Lin, Meng
Lu, Qianqian
2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 2851 - 2855

← 1 2 3 4 5 →