Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

被引:415
|
作者
Xu, Huijuan [1 ]
Saenko, Kate [1 ]
机构
[1] Boston Univ, Comp Sci, Boston, MA 02215 USA
来源
关键词
Visual question answering; Spatial attention; Memory network; Deep learning;
D O I
10.1007/978-3-319-46478-7_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address the problem of Visual Question Answering (VQA), which requires joint image and language understanding to answer a question about a given photograph. Recent approaches have applied deep image captioning methods based on convolutional-recurrent networks to this problem, but have failed to model spatial inference. To remedy this, we propose a model we call the Spatial Memory Network and apply it to the VQA task. Memory networks are recurrent neural networks with an explicit attention mechanism that selects certain parts of the information stored in memory. Our Spatial Memory Network stores neuron activations from different spatial regions of the image in its memory, and uses attention to choose regions relevant for computing the answer. We propose a novel question-guided spatial attention architecture that looks for regions relevant to either individual words or the entire question, repeating the process over multiple recurrent steps, or "hops". To better understand the inference process learned by the network, we design synthetic questions that specifically require spatial inference and visualize the network's attention. We evaluate our model on two available visual question answering datasets and obtain improved results.
引用
收藏
页码:451 / 466
页数:16
相关论文
共 50 条
  • [31] Re-Attention for Visual Question Answering
    Guo, Wenya
    Zhang, Ying
    Yang, Jufeng
    Yuan, Xiaojie
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6730 - 6743
  • [32] Re-Attention for Visual Question Answering
    Guo, Wenya
    Zhang, Ying
    Wu, Xiaoping
    Yang, Jufeng
    Cai, Xiangrui
    Yuan, Xiaojie
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 91 - 98
  • [33] Feature Enhancement in Attention for Visual Question Answering
    Lin, Yuetan
    Pang, Zhangyang
    Wang, Donghui
    Zhuang, Yueting
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4216 - 4222
  • [34] Feature Fusion Attention Visual Question Answering
    Wang, Chunlin
    Sun, Jianyong
    Chen, Xiaolin
    [J]. ICMLC 2019: 2019 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2019, : 412 - 416
  • [35] Dynamic Capsule Attention for Visual Question Answering
    Zhou, Yiyi
    Ji, Rongrong
    Su, Jinsong
    Sun, Xiaoshuai
    Chen, Weiqiu
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9324 - 9331
  • [36] Answer-Type Prediction for Visual Question Answering
    Kafle, Kushal
    Kanan, Christopher
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4976 - 4984
  • [37] Co-Attention Network With Question Type for Visual Question Answering
    Yang, Chao
    Jiang, Mengqi
    Jiang, Bin
    Zhou, Weixin
    Li, Keqin
    [J]. IEEE ACCESS, 2019, 7 : 40771 - 40781
  • [38] Dual Attention and Question Categorization-Based Visual Question Answering
    Mishra, Aakansha
    Anand, Ashish
    Guha, Prithwijit
    [J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 81 - 91
  • [39] QUES-TO-VISUAL GUIDED VISUAL QUESTION ANSWERING
    Wu, Xiangyu
    Lu, Jianfeng
    Li, Zhuanfeng
    Xiong, Fengchao
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 4193 - 4197
  • [40] Multimodal Bi-direction Guided Attention Networks for Visual Question Answering
    Cai, Linqin
    Xu, Nuoying
    Tian, Hang
    Chen, Kejia
    Fan, Haodu
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (09) : 11921 - 11943