Visual question answering model based on graph neural network and contextual attention

被引:35
|
作者
Sharma, Himanshu [1 ]
Jalal, Anand Singh [1 ]
机构
[1] GLA Univ Mathura, Dept Comp Engn & Applicat, Mathura, India
关键词
Visual question answering; Computer vision; Natural language processing; Attention;
D O I
10.1016/j.imavis.2021.104165
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) has recently appeared as a hot research area in the field of computer vision and natural language processing. A VQA model uses both image and question features and fuses them to predict an answer for a given natural question related to an image. However, most VQA approaches using attention mechanism mainly concentrate on extraction of visual information from regions of interests for answer prediction and ignore the relation between the regions of interests together with the reasoning among these regions. Apart from this limitation, VQA approaches also ignore the regions which are previously attended for answer generation. These regions which are attended in past can guide the selection of the subsequent regions of attention. In this paper, a novel VQA model is presented and formulated that utilizes this relationship between the regions and employs visual context based attention that takes into account the previously attended visual content. Experimental results demonstrate that the proposed VQA model boosts the accuracy of answer prediction on publically available datasets VQA 1.0 and VQA 2.0. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Deep Modular Bilinear Attention Network for Visual Question Answering
    Yan, Feng
    Silamu, Wushouer
    Li, Yanbing
    [J]. SENSORS, 2022, 22 (03)
  • [32] Latent Attention Network With Position Perception for Visual Question Answering
    Zhang, Jing
    Liu, Xiaoqiang
    Wang, Zhe
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 11
  • [33] Word-to-region attention network for visual question answering
    Peng, Liang
    Yang, Yang
    Bin, Yi
    Xie, Ning
    Shen, Fumin
    Ji, Yanli
    Xu, Xing
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3843 - 3858
  • [34] Dynamic Co-attention Network for Visual Question Answering
    Ebaid, Doaa B.
    Madbouly, Magda M.
    El-Zoghabi, Adel A.
    [J]. 2021 8TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2021), 2021, : 125 - 129
  • [35] Dual Attention and Question Categorization-Based Visual Question Answering
    Mishra A.
    Anand A.
    Guha P.
    [J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 81 - 91
  • [36] Static Correlative Filter based Convolutional Neural Network for Visual Question Answering
    Chen, Lijun
    Li, Qinyu
    Wang, Hanli
    Long, Yu
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2018, : 526 - 529
  • [37] An Improved Attention for Visual Question Answering
    Rahman, Tanzila
    Chou, Shih-Han
    Sigal, Leonid
    Carenini, Giuseppe
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662
  • [38] QAlayout: Question Answering Layout Based on Multimodal Attention for Visual Question Answering on Corporate Document
    Mahamoud, Ibrahim Souleiman
    Coustaty, Mickael
    Joseph, Aurelie
    d'Andecy, Vincent Poulain
    Ogier, Jean-Marc
    [J]. DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 659 - 673
  • [39] Multimodal Attention for Visual Question Answering
    Kodra, Lorena
    Mece, Elinda Kajo
    [J]. INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 783 - 792
  • [40] Differential Attention for Visual Question Answering
    Patro, Badri
    Namboodiri, Vinay P.
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7680 - 7688