Research on visual question answering based on dynamic memory network model of multiple attention mechanisms

被引:0
|
作者
Yalin Miao
Shuyun He
WenFang Cheng
Guodong Li
Meng Tong
机构
[1] Xi’an University of Technology,Department of Information Science
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Since the existing visual question answering model lacks long-term memory modules for answering complex questions, it is easy to cause the loss of effective information. In order to further improve the accuracy of the visual question answering model, this paper applies the multiple attention mechanism combining channel attention and spatial attention to memory networks for the first time and proposes a dynamic memory network model (DMN-MA) based on the multiple attention mechanism. The model uses the multiple attention mechanism in the situational memory module to obtain the most relevant visual vectors for answering questions based on continuous memory updating, storage and iterative inference of the questions, and effectively uses contextual information for answer inference. The experimental results show that the accuracy of the model in this paper reaches 64.57% and 67.18% on the large-scale public datasets COCO-QA and VQA2.0, respectively.
引用
收藏
相关论文
共 50 条
  • [1] Research on visual question answering based on dynamic memory network model of multiple attention mechanisms
    Miao, Yalin
    He, Shuyun
    Cheng, WenFang
    Li, Guodong
    Tong, Meng
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [2] Dynamic Co-attention Network for Visual Question Answering
    Ebaid, Doaa B.
    Madbouly, Magda M.
    El-Zoghabi, Adel A.
    2021 8TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2021), 2021, : 125 - 129
  • [3] Visual question answering model based on graph neural network and contextual attention
    Sharma, Himanshu
    Jalal, Anand Singh
    IMAGE AND VISION COMPUTING, 2021, 110
  • [4] Path-Wise Attention Memory Network for Visual Question Answering
    Xiang, Yingxin
    Zhang, Chengyuan
    Han, Zhichao
    Yu, Hao
    Li, Jiaye
    Zhu, Lei
    MATHEMATICS, 2022, 10 (18)
  • [5] MDAnet: Multiple Fusion Network with Double Attention for Visual Question Answering
    Feng, Junyi
    Gong, Ping
    Qiu, Guanghui
    ICVIP 2019: PROCEEDINGS OF 2019 3RD INTERNATIONAL CONFERENCE ON VIDEO AND IMAGE PROCESSING, 2019, : 143 - 147
  • [6] Dynamic Capsule Attention for Visual Question Answering
    Zhou, Yiyi
    Ji, Rongrong
    Su, Jinsong
    Sun, Xiaoshuai
    Chen, Weiqiu
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9324 - 9331
  • [7] Co-attention Network for Visual Question Answering Based on Dual Attention
    Dong, Feng
    Wang, Xiaofeng
    Oad, Ammar
    Talpur, Mir Sajjad Hussain
    Journal of Engineering Science and Technology Review, 2021, 14 (06) : 116 - 123
  • [8] Collaborative Attention Network to Enhance Visual Question Answering
    Gu, Rui
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 304 - 305
  • [9] ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING
    Gu, Geonmo
    Kim, Seong Tae
    Ro, Yong Man
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 997 - 1002
  • [10] Triple attention network for sentimental visual question answering
    Ruwa, Nelson
    Mao, Qirong
    Song, Heping
    Jia, Hongjie
    Dong, Ming
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 189