Co-attention graph convolutional network for visual question answering

被引:0
|
作者
Chuan Liu
Ying-Ying Tan
Tian-Tian Xia
Jiajing Zhang
Ming Zhu
机构
[1] Anhui Jianzhu University,School of Mathematics and Physics
[2] Anhui University,School of Integrated Circuits
[3] Anhui Jianzhu University,Operations Research and Data Science Laboratory
来源
Multimedia Systems | 2023年 / 29卷
关键词
Visual question answering; Binary relational reasoning; Spatial graph convolution; Attention mechanism;
D O I
暂无
中图分类号
学科分类号
摘要
Visual Question Answering (VQA) is a challenging task that requires a fine-grained understanding of both the visual content of images and the textual content of questions. Conventional visual attention model, which is designed primarily from the perspective of attention mechanism, lacks the ability to reason about relationships between visual objects and ignores the multimodal interactions between questions and images. In this work, we propose a combined both graph convolutional network and co-attention network to circumvent the aforementioned problem. The model employs binary relational reasoning as the graph learner module to learn a graph structure representation that captures relationships between visual objects and learns image representation related to the specific question that has an awareness of spatial location via spatial graph convolution. After that, we perform parallel co-attention learning by passing image representations and features of question words through a deep co-attention module. Experiment results demonstrate that the Overall accuracy of our model delivers 68.67%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$68.67\%$$\end{document} on the test-std set of the benchmark VQA v2.0 dataset, which outperforms most existing models.
引用
收藏
页码:2527 / 2543
页数:16
相关论文
共 50 条
  • [1] Co-attention graph convolutional network for visual question answering
    Liu, Chuan
    Tan, Ying-Ying
    Xia, Tian-Tian
    Zhang, Jiajing
    Zhu, Ming
    [J]. MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2527 - 2543
  • [2] Co-Attention Network With Question Type for Visual Question Answering
    Yang, Chao
    Jiang, Mengqi
    Jiang, Bin
    Zhou, Weixin
    Li, Keqin
    [J]. IEEE ACCESS, 2019, 7 : 40771 - 40781
  • [3] Dynamic Co-attention Network for Visual Question Answering
    Ebaid, Doaa B.
    Madbouly, Magda M.
    El-Zoghabi, Adel A.
    [J]. 2021 8TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2021), 2021, : 125 - 129
  • [4] Co-attention Network for Visual Question Answering Based on Dual Attention
    Dong, Feng
    Wang, Xiaofeng
    Oad, Ammar
    Talpur, Mir Sajjad Hussain
    [J]. Journal of Engineering Science and Technology Review, 2021, 14 (06) : 116 - 123
  • [5] Multi-Channel Co-Attention Network for Visual Question Answering
    Tian, Weidong
    He, Bin
    Wang, Nanxun
    Zhao, Zhongqiu
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [6] Hierarchical Question-Image Co-Attention for Visual Question Answering
    Lu, Jiasen
    Yang, Jianwei
    Batra, Dhruv
    Parikh, Devi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [7] Deep Modular Co-Attention Networks for Visual Question Answering
    Yu, Zhou
    Yu, Jun
    Cui, Yuhao
    Tao, Dacheng
    Tian, Qi
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6274 - 6283
  • [8] An Effective Dense Co-Attention Networks for Visual Question Answering
    He, Shirong
    Han, Dezhi
    [J]. SENSORS, 2020, 20 (17) : 1 - 15
  • [9] SceneGATE: Scene-Graph Based Co-Attention Networks for Text Visual Question Answering
    Cao, Feiqi
    Luo, Siwen
    Nunez, Felipe
    Wen, Zean
    Poon, Josiah
    Han, Soyeon Caren
    [J]. ROBOTICS, 2023, 12 (04)
  • [10] Cross-Modal Multistep Fusion Network With Co-Attention for Visual Question Answering
    Lao, Mingrui
    Guo, Yanming
    Wang, Hui
    Zhang, Xin
    [J]. IEEE ACCESS, 2018, 6 : 31516 - 31524