MKEAH: Multimodal knowledge extraction and accumulation based on hyperplane embedding for knowledge-based visual question answering

被引:0
|
作者
Heng ZHANG [1 ]
Zhihua WEI [1 ]
Guanming LIU [1 ]
Rui WANG [1 ]
Ruibin MU [2 ]
Chuanbao LIU [2 ]
Aiquan YUAN [2 ]
Guodong CAO [2 ]
Ning HU [2 ]
机构
[1] Tongji University
[2] Alibaba
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Background External knowledge representations play an essential role in knowledge-based visual question and answering to better understand complex scenarios in the open world. Recent entity-relationship embedding approaches are deficient in representing some complex relations, resulting in a lack of topic-related knowledge and redundancy in topic-irrelevant information. Methods To this end, we propose MKEAH: Multimodal Knowledge Extraction and Accumulation on Hyperplanes. To ensure that the lengths of the feature vectors projected onto the hyperplane compare equally and to filter out sufficient topic-irrelevant information, two losses are proposed to learn the triplet representations from the complementary views: range loss and orthogonal loss. To interpret the capability of extracting topic-related knowledge, we present the Topic Similarity(TS) between topic and entity-relations. Results Experimental results demonstrate the effectiveness of hyperplane embedding for knowledge representation in knowledge-based visual question answering. Our model outperformed state-of-the-art methods by 2.12% and 3.24% on two challenging knowledge-request datasets: OK-VQA and KRVQA, respectively. Conclusions The obvious advantages of our model in TS show that using hyperplane embedding to represent multimodal knowledge can improve its ability to extract topic-related knowledge.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] MKEAH: Multimodal knowledge extraction and accumulation based on hyperplane embedding for knowledge-based visual question answering
    Zhang, Heng
    Wei, Zhihua
    Liu, Guanming
    Wang, Rui
    Mu, Ruibin
    Liu, Chuanbao
    Yuan, Aiquan
    Cao, Guodong
    Hu, Ning
    [J]. Virtual Reality and Intelligent Hardware, 2024, 6 (04): : 280 - 291
  • [2] MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering
    Ding, Yang
    Yu, Jing
    Liu, Bang
    Hu, Yue
    Cui, Mingxin
    Wu, Qi
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5079 - 5088
  • [3] Multimodal Inverse Cloze Task for Knowledge-Based Visual Question Answering
    Lerner, Paul
    Ferret, Olivier
    Guinaudeau, Camille
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT I, 2023, 13980 : 569 - 587
  • [4] Knowledge-based question answering
    Rinaldi, F
    Dowdall, J
    Hess, M
    Mollá, D
    Schwitter, R
    Kaljurand, K
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2003, 2773 : 785 - 792
  • [5] Knowledge-based question answering
    Hermjakob, U
    Hovy, EH
    Lin, CY
    [J]. 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XVI, PROCEEDINGS: COMPUTER SCIENCE III, 2002, : 66 - 71
  • [6] Explicit Knowledge-based Reasoning for Visual Question Answering
    Wang, Peng
    Wu, Qi
    Shen, Chunhua
    Dick, Anthony
    van den Hengel, Anton
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1290 - 1296
  • [7] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Zhenqiang Su
    Gang Gou
    [J]. Knowledge and Information Systems, 2024, 66 : 2193 - 2208
  • [8] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Su, Zhenqiang
    Gou, Gang
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (03) : 2193 - 2208
  • [9] Knowledge-based question answering using the semantic embedding space
    Yang, Min-Chul
    Lee, Do-Gil
    Park, So-Young
    Rim, Hae-Chang
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (23) : 9086 - 9104
  • [10] Visual Question Answering based on multimodal triplet knowledge accumuation
    Wang, Fengjuan
    An, Gaoyun
    [J]. 2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 81 - 84