Visual question answering model based on graph neural network and contextual attention

被引:35
|
作者
Sharma, Himanshu [1 ]
Jalal, Anand Singh [1 ]
机构
[1] GLA Univ Mathura, Dept Comp Engn & Applicat, Mathura, India
关键词
Visual question answering; Computer vision; Natural language processing; Attention;
D O I
10.1016/j.imavis.2021.104165
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) has recently appeared as a hot research area in the field of computer vision and natural language processing. A VQA model uses both image and question features and fuses them to predict an answer for a given natural question related to an image. However, most VQA approaches using attention mechanism mainly concentrate on extraction of visual information from regions of interests for answer prediction and ignore the relation between the regions of interests together with the reasoning among these regions. Apart from this limitation, VQA approaches also ignore the regions which are previously attended for answer generation. These regions which are attended in past can guide the selection of the subsequent regions of attention. In this paper, a novel VQA model is presented and formulated that utilizes this relationship between the regions and employs visual context based attention that takes into account the previously attended visual content. Experimental results demonstrate that the proposed VQA model boosts the accuracy of answer prediction on publically available datasets VQA 1.0 and VQA 2.0. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Multi-modal Contextual Graph Neural Network for Text Visual Question Answering
    Liang, Yaoyuan
    Wang, Xin
    Duan, Xuguang
    Zhu, Wenwu
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3491 - 3498
  • [2] Cascading Attention Visual Question Answering Model Based on Graph Structure
    Zhang, Haoyu
    Zhang, De
    [J]. Computer Engineering and Applications, 2023, 59 (06) : 155 - 161
  • [3] Deep Attention Neural Tensor Network for Visual Question Answering
    Bai, Yalong
    Fu, Jianlong
    Zhao, Tiejun
    Mei, Tao
    [J]. COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 : 21 - 37
  • [4] Co-attention graph convolutional network for visual question answering
    Liu, Chuan
    Tan, Ying-Ying
    Xia, Tian-Tian
    Zhang, Jiajing
    Zhu, Ming
    [J]. MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2527 - 2543
  • [5] Co-attention graph convolutional network for visual question answering
    Chuan Liu
    Ying-Ying Tan
    Tian-Tian Xia
    Jiajing Zhang
    Ming Zhu
    [J]. Multimedia Systems, 2023, 29 : 2527 - 2543
  • [6] Relation-Aware Graph Attention Network for Visual Question Answering
    Li, Linjie
    Gan, Zhe
    Cheng, Yu
    Liu, Jingjing
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10312 - 10321
  • [7] Visual Question Answering reasoning with external knowledge based on bimodal graph neural network
    Yang, Zhenyu
    Wu, Lei
    Wen, Peian
    Chen, Peng
    [J]. ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (04): : 1948 - 1965
  • [8] A multi-scale contextual attention network for remote sensing visual question answering
    Feng, Jiangfan
    Wang, Hui
    [J]. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 126
  • [9] Co-attention Network for Visual Question Answering Based on Dual Attention
    Dong, Feng
    Wang, Xiaofeng
    Oad, Ammar
    Talpur, Mir Sajjad Hussain
    [J]. Journal of Engineering Science and Technology Review, 2021, 14 (06) : 116 - 123
  • [10] Progressive Graph Attention Network for Video Question Answering
    Peng, Liang
    Yang, Shuangji
    Bin, Yi
    Wang, Guoqing
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2871 - 2879