Visual Question Answering Research on Joint Knowledge and Visual Information Reasoning

被引:0
|
作者
Su, Zhenqiang [1 ]
Gou, Gang [1 ]
机构
[1] State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang,550025, China
关键词
Knowledge representation - Semantics - Visual languages;
D O I
10.3778/j.issn.1002-8331.2209-0456
中图分类号
学科分类号
摘要
As a task in the multimodal field, visual question answering requires fusion and reasoning of the features of different modalities, which has important application value. In traditional visual question answering, the answer to the question can be well reasoned only by relying on the visual information of the image. However, pure visual information cannot meet the diverse question-answering needs in real-world scenarios. Knowledge plays an important role in visual question answering and can well assist question answering. Knowledge-based open visual question answering needs to correlate external knowledge to achieve cross-modal scene understanding. In order to better integrate visual information and related external knowledge, a bilinear structure for joint knowledge and visual information reasoning is proposed, and a dual-guided attention module for knowledge representation by image features and question features is designed. Firstly, the model uses the pre-trained vision-language model to obtain the feature representation and visual reasoning information of the question and image, Secondly, the similarity matrix is used to calculate the image object area under the semantic alignment of the question, and then the regional features after the joint alignment of the question features jointly guide the knowledge representation to obtain knowledge reasoning information. Finally, the visual reasoning information and the knowledge reasoning information are fused to get the final answer. The experimental results on the OK-VQA dataset show that the accuracy of the model is 1.97 percentage points and 4.82 percentage points higher than the two baseline methods, respectively, which verifies the effectiveness of the model. © 2016 Chinese Medical Journals Publishing House Co.Ltd. All rights reserved.
引用
收藏
页码:95 / 102
相关论文
共 50 条
  • [31] LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering
    Gao, Jingying
    Wu, Qi
    Blair, Alan
    Pagnucco, Maurice
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [32] Towards Reasoning Ability in Scene Text Visual Question Answering
    Wang, Qingqing
    Xiao, Liqiang
    Lu, Yue
    Jin, Yaohui
    He, Hao
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2281 - 2289
  • [33] An effective spatial relational reasoning networks for visual question answering
    Shen, Xiang
    Han, Dezhi
    Chen, Chongqing
    Luo, Gaofeng
    Wu, Zhongdai
    PLOS ONE, 2022, 17 (11):
  • [34] A Symbolic-Neural Reasoning Model for Visual Question Answering
    Gao, Jingying
    Blair, Alan
    Pagnucco, Maurice
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [35] Comprehensive-perception dynamic reasoning for visual question answering
    Shuang, Kai
    Guo, Jinyu
    Wang, Zihan
    PATTERN RECOGNITION, 2022, 131
  • [36] Joint reasoning with knowledge subgraphs for Multiple Choice Question Answering
    Zhang, Qin
    Chen, Shangsi
    Fang, Meng
    Chen, Xiaojun
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [37] Semantic Relation Graph Reasoning Network for Visual Question Answering
    Lan, Hong
    Zhang, Pufen
    TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2021, 11719
  • [38] Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
    Banerjee, Pratyay
    Gokhale, Tejas
    Yang, Yezhou
    Baral, Chitta
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1888 - 1898
  • [39] Visual Question Answering
    Nada, Ahmed
    Chen, Min
    2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 6 - 10
  • [40] Knowledge-Enhanced Visual Question Answering with Multi-modal Joint Guidance
    Wang, Jianfeng
    Zhang, Anda
    Du, Huifang
    Wang, Haofen
    Zhang, Wenqiang
    PROCEEDINGS OF THE 11TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE GRAPHS, IJCKG 2022, 2022, : 115 - 120