Research on visual question answering based on dynamic memory network model of multiple attention mechanisms

被引：0

作者：

Yalin Miao

Shuyun He

WenFang Cheng

Guodong Li

Meng Tong

机构：

[1] Xi’an University of Technology,Department of Information Science

来源：

Scientific Reports | / 12卷

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Since the existing visual question answering model lacks long-term memory modules for answering complex questions, it is easy to cause the loss of effective information. In order to further improve the accuracy of the visual question answering model, this paper applies the multiple attention mechanism combining channel attention and spatial attention to memory networks for the first time and proposes a dynamic memory network model (DMN-MA) based on the multiple attention mechanism. The model uses the multiple attention mechanism in the situational memory module to obtain the most relevant visual vectors for answering questions based on continuous memory updating, storage and iterative inference of the questions, and effectively uses contextual information for answer inference. The experimental results show that the accuracy of the model in this paper reaches 64.57% and 67.18% on the large-scale public datasets COCO-QA and VQA2.0, respectively.

引用

共 50 条

[1] Research on visual question answering based on dynamic memory network model of multiple attention mechanisms
Miao, Yalin
He, Shuyun
Cheng, WenFang
Li, Guodong
Tong, Meng
SCIENTIFIC REPORTS, 2022, 12 (01)
[2] Dynamic Co-attention Network for Visual Question Answering
Ebaid, Doaa B.
Madbouly, Magda M.
El-Zoghabi, Adel A.
2021 8TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2021), 2021, : 125 - 129
[3] Visual question answering model based on graph neural network and contextual attention
Sharma, Himanshu
Jalal, Anand Singh
IMAGE AND VISION COMPUTING, 2021, 110
[4] Path-Wise Attention Memory Network for Visual Question Answering
Xiang, Yingxin
Zhang, Chengyuan
Han, Zhichao
Yu, Hao
Li, Jiaye
Zhu, Lei
MATHEMATICS, 2022, 10 (18)
[5] MDAnet: Multiple Fusion Network with Double Attention for Visual Question Answering
Feng, Junyi
Gong, Ping
Qiu, Guanghui
ICVIP 2019: PROCEEDINGS OF 2019 3RD INTERNATIONAL CONFERENCE ON VIDEO AND IMAGE PROCESSING, 2019, : 143 - 147
[6] Dynamic Capsule Attention for Visual Question Answering
Zhou, Yiyi
Ji, Rongrong
Su, Jinsong
Sun, Xiaoshuai
Chen, Weiqiu
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9324 - 9331
[7] Co-attention Network for Visual Question Answering Based on Dual Attention
Dong, Feng
Wang, Xiaofeng
Oad, Ammar
Talpur, Mir Sajjad Hussain
Journal of Engineering Science and Technology Review, 2021, 14 (06) : 116 - 123
[8] Collaborative Attention Network to Enhance Visual Question Answering
Gu, Rui
BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 304 - 305
[9] ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING
Gu, Geonmo
Kim, Seong Tae
Ro, Yong Man
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 997 - 1002
[10] Triple attention network for sentimental visual question answering
Ruwa, Nelson
Mao, Qirong
Song, Heping
Jia, Hongjie
Dong, Ming
COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 189

← 1 2 3 4 5 →