Visual Question Answering Research on Joint Knowledge and Visual Information Reasoning

被引：0

作者：

Su, Zhenqiang ^{[1
]}

Gou, Gang ^{[1
]}

机构：

[1] State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang,550025, China

来源：

Computer Engineering and Applications | 2024年 / 60卷 / 05期

关键词：

Knowledge representation - Semantics - Visual languages;

D O I：

10.3778/j.issn.1002-8331.2209-0456

中图分类号：

学科分类号：

摘要：

As a task in the multimodal field, visual question answering requires fusion and reasoning of the features of different modalities, which has important application value. In traditional visual question answering, the answer to the question can be well reasoned only by relying on the visual information of the image. However, pure visual information cannot meet the diverse question-answering needs in real-world scenarios. Knowledge plays an important role in visual question answering and can well assist question answering. Knowledge-based open visual question answering needs to correlate external knowledge to achieve cross-modal scene understanding. In order to better integrate visual information and related external knowledge, a bilinear structure for joint knowledge and visual information reasoning is proposed, and a dual-guided attention module for knowledge representation by image features and question features is designed. Firstly, the model uses the pre-trained vision-language model to obtain the feature representation and visual reasoning information of the question and image, Secondly, the similarity matrix is used to calculate the image object area under the semantic alignment of the question, and then the regional features after the joint alignment of the question features jointly guide the knowledge representation to obtain knowledge reasoning information. Finally, the visual reasoning information and the knowledge reasoning information are fused to get the final answer. The experimental results on the OK-VQA dataset show that the accuracy of the model is 1.97 percentage points and 4.82 percentage points higher than the two baseline methods, respectively, which verifies the effectiveness of the model. © 2016 Chinese Medical Journals Publishing House Co.Ltd. All rights reserved.

引用

页码：95 / 102

共 50 条

[31] LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering
Gao, Jingying
Wu, Qi
Blair, Alan
Pagnucco, Maurice
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[32] Towards Reasoning Ability in Scene Text Visual Question Answering
Wang, Qingqing
Xiao, Liqiang
Lu, Yue
Jin, Yaohui
He, Hao
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2281 - 2289
[33] An effective spatial relational reasoning networks for visual question answering
Shen, Xiang
Han, Dezhi
Chen, Chongqing
Luo, Gaofeng
Wu, Zhongdai
PLOS ONE, 2022, 17 (11):
[34] A Symbolic-Neural Reasoning Model for Visual Question Answering
Gao, Jingying
Blair, Alan
Pagnucco, Maurice
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[35] Comprehensive-perception dynamic reasoning for visual question answering
Shuang, Kai
Guo, Jinyu
Wang, Zihan
PATTERN RECOGNITION, 2022, 131
[36] Joint reasoning with knowledge subgraphs for Multiple Choice Question Answering
Zhang, Qin
Chen, Shangsi
Fang, Meng
Chen, Xiaojun
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
[37] Semantic Relation Graph Reasoning Network for Visual Question Answering
Lan, Hong
Zhang, Pufen
TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2021, 11719
[38] Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
Banerjee, Pratyay
Gokhale, Tejas
Yang, Yezhou
Baral, Chitta
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1888 - 1898
[39] Visual Question Answering
Nada, Ahmed
Chen, Min
2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 6 - 10
[40] Knowledge-Enhanced Visual Question Answering with Multi-modal Joint Guidance
Wang, Jianfeng
Zhang, Anda
Du, Huifang
Wang, Haofen
Zhang, Wenqiang
PROCEEDINGS OF THE 11TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE GRAPHS, IJCKG 2022, 2022, : 115 - 120

← 1 2 3 4 5 →