Exploiting Query Knowledge Embedding and Trilinear Joint Embedding for Visual Question Answering

被引:0
|
作者
Chen, Zheng [1 ]
Wen, Yaxin [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Informat & Software Engn, Chengdu 611731, Sichuan, Peoples R China
关键词
Visual question answering; Attention mechanism; Knowledge base; Joint embedding;
D O I
10.1007/978-981-99-4752-2_64
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) aims to answer natural language questions about a given image. Researchers generally believe that incorporating external knowledge can improve VQA task's performance. However, existing methods face limitations in acquiring and utilizing such knowledge, preventing them from effectively enhancing a model's question-answering capability. In this paper, we propose a novel VQA approach based on question-query for Knowledge Embedding. In our approach, we design question query rules to obtain critical external knowledge and then embed this knowledge by integrating it with the question as input features for text modalities. Traditional multimodal feature fusion techniques rely solely on local features, which may result in the loss of global information. To address this issue, we introduce a feature fusion method based on Trilinear Joint Embedding. Utilizing an attention mechanism, we generate a feature matrix composed of question, knowledge, and image components. This matrix is then trilinearly joint embedded to form a novel global feature vector. Due to the computational challenges associated with high-dimensional vectors produced during the trilinear joint embedding process, we employ Tensor Decomposition to break down this vector into a sum of several low-rank tensors. Subsequently, we input the global feature vector into a classifier to obtain the answer in a multicategory classification fashion. Experimental results on the VQAv2, OKVQA, and VizWiz public datasets demonstrate that our approach can achieve accuracy improvements of 1.78%, 3.95%, and 1.16%. Our code are available at https://git hub.com/yxNoth/KB-VLT.
引用
收藏
页码:780 / 791
页数:12
相关论文
共 50 条
  • [21] Multi-hop Question Answering with Knowledge Graph Embedding in a Similar Semantic Space
    Li, Fengying
    Chen, Mingdong
    Dong, Rongsheng
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [22] Question Answering over Knowledgebase with Attention-based LSTM Networks and Knowledge Embedding
    Chen, Lin
    Zeng, Guanping
    Zhang, Qingchuan
    Chen, Xingyu
    Wu, Danfeng
    2017 IEEE 16TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2017, : 243 - 246
  • [23] User Embedding for Expert Finding in Community Question Answering
    Ghasemi, Negin
    Fatourechi, Ramin
    Momtazi, Saeedeh
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2021, 15 (04)
  • [24] Exploiting hierarchical visual features for visual question answering
    Hong, Jongkwang
    Fu, Jianlong
    Uh, Youngjung
    Mei, Tao
    Byun, Hyeran
    NEUROCOMPUTING, 2019, 351 : 187 - 195
  • [25] Trilinear Distillation Learning and Question Feature Capturing for Medical Visual Question Answering
    Long, Shaopei
    Li, Yong
    Weng, Heng
    Tang, Buzhou
    Wang, Fu Lee
    Hao, Tianyong
    NEURAL COMPUTING FOR ADVANCED APPLICATIONS, NCAA 2024, PT III, 2025, 2183 : 162 - 177
  • [26] Quantum Language Model With Entanglement Embedding for Question Answering
    Chen, Yiwei
    Pan, Yu
    Dong, Daoyi
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (06) : 3467 - 3478
  • [27] BERT with History Answer Embedding for Conversational Question Answering
    Qu, Chen
    Yang, Liu
    Qiu, Minghui
    Croft, W. Bruce
    Zhang, Yongfeng
    Iyyer, Mohit
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1133 - 1136
  • [28] Query Answering to IQ Test Questions Using Word Embedding
    Frackowiak, Michal
    Dutkiewicz, Jakub
    Jedrzejek, Czeslaw
    Retinger, Marek
    Werda, Pawel
    MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, MISSI 2016, 2017, 506 : 283 - 294
  • [29] Parallel multi-head attention and term-weighted question embedding for medical visual question answering
    Manmadhan, Sruthy
    Kovoor, Binsu C.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (22) : 34937 - 34958
  • [30] Parallel multi-head attention and term-weighted question embedding for medical visual question answering
    Sruthy Manmadhan
    Binsu C Kovoor
    Multimedia Tools and Applications, 2023, 82 : 34937 - 34958