Exploiting Query Knowledge Embedding and Trilinear Joint Embedding for Visual Question Answering

被引:0
|
作者
Chen, Zheng [1 ]
Wen, Yaxin [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Informat & Software Engn, Chengdu 611731, Sichuan, Peoples R China
关键词
Visual question answering; Attention mechanism; Knowledge base; Joint embedding;
D O I
10.1007/978-981-99-4752-2_64
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) aims to answer natural language questions about a given image. Researchers generally believe that incorporating external knowledge can improve VQA task's performance. However, existing methods face limitations in acquiring and utilizing such knowledge, preventing them from effectively enhancing a model's question-answering capability. In this paper, we propose a novel VQA approach based on question-query for Knowledge Embedding. In our approach, we design question query rules to obtain critical external knowledge and then embed this knowledge by integrating it with the question as input features for text modalities. Traditional multimodal feature fusion techniques rely solely on local features, which may result in the loss of global information. To address this issue, we introduce a feature fusion method based on Trilinear Joint Embedding. Utilizing an attention mechanism, we generate a feature matrix composed of question, knowledge, and image components. This matrix is then trilinearly joint embedded to form a novel global feature vector. Due to the computational challenges associated with high-dimensional vectors produced during the trilinear joint embedding process, we employ Tensor Decomposition to break down this vector into a sum of several low-rank tensors. Subsequently, we input the global feature vector into a classifier to obtain the answer in a multicategory classification fashion. Experimental results on the VQAv2, OKVQA, and VizWiz public datasets demonstrate that our approach can achieve accuracy improvements of 1.78%, 3.95%, and 1.16%. Our code are available at https://git hub.com/yxNoth/KB-VLT.
引用
收藏
页码:780 / 791
页数:12
相关论文
共 50 条
  • [1] Exploiting Sentence Embedding for Medical Question Answering
    Hao, Yu
    Liu, Xien
    Wu, Ji
    Lv, Ping
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 938 - 945
  • [2] Knowledge Graph Embedding Based Question Answering
    Huang, Xiao
    Zhang, Jingyuan
    Li, Dingcheng
    Li, Ping
    PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 105 - 113
  • [3] An Enhanced Term Weighted Question Embedding for Visual Question Answering
    Manmadhan, Sruthy
    Kovoor, Binsu C.
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2022, 21 (02)
  • [4] Multi visual and textual embedding on visual question answering for blind people
    Tung Le
    Huy Tien Nguyen
    Minh Le Nguyen
    NEUROCOMPUTING, 2021, 465 : 451 - 464
  • [5] Embedding Spatial Relations in Visual Question Answering for Remote Sensing
    Faure, Maxime
    Lobry, Sylvain
    Kurtz, Camille
    Wendling, Laurent
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 310 - 316
  • [6] MKEAH: Multimodal knowledge extraction and accumulation based on hyperplane embedding for knowledge-based visual question answering
    Heng ZHANG
    Zhihua WEI
    Guanming LIU
    Rui WANG
    Ruibin MU
    Chuanbao LIU
    Aiquan YUAN
    Guodong CAO
    Ning HU
    虚拟现实与智能硬件(中英文), 2024, 6 (04) : 280 - 291
  • [7] MKEAH: Multimodal knowledge extraction and accumulation based on hyperplane embedding for knowledge-based visual question answering
    Zhang, Heng
    Wei, Zhihua
    Liu, Guanming
    Wang, Rui
    Mu, Ruibin
    Liu, Chuanbao
    Yuan, Aiquan
    Cao, Guodong
    Hu, Ning
    Virtual Reality and Intelligent Hardware, 6 (04): : 280 - 291
  • [8] Compact Trilinear Interaction for Visual Question Answering
    Tuong Do
    Thanh-Toan Do
    Huy Tran
    Tjiputra, Erman
    Tran, Quang D.
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 392 - 401
  • [9] Knowledge-based question answering using the semantic embedding space
    Yang, Min-Chul
    Lee, Do-Gil
    Park, So-Young
    Rim, Hae-Chang
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (23) : 9086 - 9104
  • [10] A Question Embedding-based Method to Enrich Features for Knowledge Base Question Answering
    Wang, Xin
    Lin, Meng
    Lu, Qianqian
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 2851 - 2855