Visual Question Answering Research on Joint Knowledge and Visual Information Reasoning

被引：0

作者：

Su, Zhenqiang ^{[1
]}

Gou, Gang ^{[1
]}

机构：

[1] State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang,550025, China

来源：

Computer Engineering and Applications | 2024年 / 60卷 / 05期

关键词：

Knowledge representation - Semantics - Visual languages;

D O I：

10.3778/j.issn.1002-8331.2209-0456

中图分类号：

学科分类号：

摘要：

As a task in the multimodal field, visual question answering requires fusion and reasoning of the features of different modalities, which has important application value. In traditional visual question answering, the answer to the question can be well reasoned only by relying on the visual information of the image. However, pure visual information cannot meet the diverse question-answering needs in real-world scenarios. Knowledge plays an important role in visual question answering and can well assist question answering. Knowledge-based open visual question answering needs to correlate external knowledge to achieve cross-modal scene understanding. In order to better integrate visual information and related external knowledge, a bilinear structure for joint knowledge and visual information reasoning is proposed, and a dual-guided attention module for knowledge representation by image features and question features is designed. Firstly, the model uses the pre-trained vision-language model to obtain the feature representation and visual reasoning information of the question and image, Secondly, the similarity matrix is used to calculate the image object area under the semantic alignment of the question, and then the regional features after the joint alignment of the question features jointly guide the knowledge representation to obtain knowledge reasoning information. Finally, the visual reasoning information and the knowledge reasoning information are fused to get the final answer. The experimental results on the OK-VQA dataset show that the accuracy of the model is 1.97 percentage points and 4.82 percentage points higher than the two baseline methods, respectively, which verifies the effectiveness of the model. © 2016 Chinese Medical Journals Publishing House Co.Ltd. All rights reserved.

引用

页码：95 / 102

共 50 条

[21] Interpretable Visual Question Answering by Reasoning on Dependency Trees
Cao, Qingxing
Liang, Xiaodan
Li, Bailin
Lin, Liang
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (03) : 887 - 901
[22] Relational reasoning and adaptive fusion for visual question answering
Shen, Xiang
Han, Dezhi
Zong, Liang
Guo, Zihan
Hua, Jie
APPLIED INTELLIGENCE, 2024, 54 (06) : 5062 - 5080
[23] INTERPRETABLE VISUAL QUESTION ANSWERING VIA REASONING SUPERVISION
Parelli, Maria
Mallis, Dimitrios
Diomataris, Markos
Pitsikalis, Vassilis
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2525 - 2529
[24] MUREL: Multimodal Relational Reasoning for Visual Question Answering
Cadene, Remi
Ben-younes, Hedi
Cord, Matthieu
Thome, Nicolas
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1989 - 1998
[25] Maintaining Reasoning Consistency in Compositional Visual Question Answering
Jing, Chenchen
Jia, Yunde
Wu, Yuwei
Liu, Xinyu
Wu, Qi
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5089 - 5098
[26] A DIAGNOSTIC STUDY OF VISUAL QUESTION ANSWERING WITH ANALOGICAL REASONING
Huang, Ziqi
Zhu, Hongyuan
Sun, Ying
Choi, Dongkyu
Tan, Cheston
Lim, Joo-Hwee
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2463 - 2467
[27] Exploiting Query Knowledge Embedding and Trilinear Joint Embedding for Visual Question Answering
Chen, Zheng
Wen, Yaxin
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 780 - 791
[28] Visual Question Answering reasoning with external knowledge based on bimodal graph neural network
Yang, Zhenyu
Wu, Lei
Wen, Peian
Chen, Peng
ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (04): : 1948 - 1965
[29] Information fusion in visual question answering: A Survey
Zhang, Dongxiang
Cao, Rui
Wu, Sai
INFORMATION FUSION, 2019, 52 : 268 - 280
[30] Self-Critical Reasoning for Robust Visual Question Answering
Wu, Jialin
Mooney, Raymond J.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32

← 1 2 3 4 5 →