Visual question answering model based on graph neural network and contextual attention

被引：35

作者：

Sharma, Himanshu ^{[1
]}

Jalal, Anand Singh ^{[1
]}

机构：

[1] GLA Univ Mathura, Dept Comp Engn & Applicat, Mathura, India

来源：

IMAGE AND VISION COMPUTING | 2021年 / 110卷

关键词：

Visual question answering; Computer vision; Natural language processing; Attention;

D O I：

10.1016/j.imavis.2021.104165

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual Question Answering (VQA) has recently appeared as a hot research area in the field of computer vision and natural language processing. A VQA model uses both image and question features and fuses them to predict an answer for a given natural question related to an image. However, most VQA approaches using attention mechanism mainly concentrate on extraction of visual information from regions of interests for answer prediction and ignore the relation between the regions of interests together with the reasoning among these regions. Apart from this limitation, VQA approaches also ignore the regions which are previously attended for answer generation. These regions which are attended in past can guide the selection of the subsequent regions of attention. In this paper, a novel VQA model is presented and formulated that utilizes this relationship between the regions and employs visual context based attention that takes into account the previously attended visual content. Experimental results demonstrate that the proposed VQA model boosts the accuracy of answer prediction on publically available datasets VQA 1.0 and VQA 2.0. (c) 2021 Elsevier B.V. All rights reserved.

引用

页数：11

共 50 条

[1] Multi-modal Contextual Graph Neural Network for Text Visual Question Answering
Liang, Yaoyuan
Wang, Xin
Duan, Xuguang
Zhu, Wenwu
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3491 - 3498
[2] Cascading Attention Visual Question Answering Model Based on Graph Structure
Zhang, Haoyu
Zhang, De
[J]. Computer Engineering and Applications, 2023, 59 (06) : 155 - 161
[3] Deep Attention Neural Tensor Network for Visual Question Answering
Bai, Yalong
Fu, Jianlong
Zhao, Tiejun
Mei, Tao
[J]. COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 : 21 - 37
[4] Co-attention graph convolutional network for visual question answering
Liu, Chuan
Tan, Ying-Ying
Xia, Tian-Tian
Zhang, Jiajing
Zhu, Ming
[J]. MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2527 - 2543
[5] Co-attention graph convolutional network for visual question answering
Chuan Liu
Ying-Ying Tan
Tian-Tian Xia
Jiajing Zhang
Ming Zhu
[J]. Multimedia Systems, 2023, 29 : 2527 - 2543
[6] Relation-Aware Graph Attention Network for Visual Question Answering
Li, Linjie
Gan, Zhe
Cheng, Yu
Liu, Jingjing
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10312 - 10321
[7] Visual Question Answering reasoning with external knowledge based on bimodal graph neural network
Yang, Zhenyu
Wu, Lei
Wen, Peian
Chen, Peng
[J]. ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (04): : 1948 - 1965
[8] A multi-scale contextual attention network for remote sensing visual question answering
Feng, Jiangfan
Wang, Hui
[J]. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 126
[9] Co-attention Network for Visual Question Answering Based on Dual Attention
Dong, Feng
Wang, Xiaofeng
Oad, Ammar
Talpur, Mir Sajjad Hussain
[J]. Journal of Engineering Science and Technology Review, 2021, 14 (06) : 116 - 123
[10] Progressive Graph Attention Network for Video Question Answering
Peng, Liang
Yang, Shuangji
Bin, Yi
Wang, Guoqing
[J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2871 - 2879

← 1 2 3 4 5 →