Stacked Attention Networks for Image Question Answering

被引:1166
|
作者
Yang, Zichao [1 ]
He, Xiaodong [2 ]
Gao, Jianfeng [2 ]
Deng, Li [2 ]
Smola, Alex [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Microsoft Res, Redmond, WA 98052 USA
来源
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2016年
关键词
D O I
10.1109/CVPR.2016.10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents stacked attention networks (SANs) that learn to answer natural language questions from images. SANs use semantic representation of a question as query to search for the regions in an image that are related to the answer. We argue that image question answering (QA) often requires multiple steps of reasoning. Thus, we develop a multiple-layer SAN in which we query an image multiple times to infer the answer progressively. Experiments conducted on four image QA data sets demonstrate that the proposed SANs significantly outperform previous state-of-the-art approaches. The visualization of the attention layers illustrates the progress that the SAN locates the relevant visual clues that lead to the answer of the question layer-by-layer.
引用
收藏
页码:21 / 29
页数:9
相关论文
共 50 条
  • [1] Stacked Self-Attention Networks for Visual Question Answering
    Sun, Qiang
    Fu, Yanwei
    ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 207 - 211
  • [2] Question Answering with Hierarchical Attention Networks
    Alpay, Tayfun
    Heinrich, Stefan
    Nelskamp, Michael
    Wermter, Stefan
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [3] Stacked Attention based Textbook Visual Question Answering with BERT
    Aishwarya, R.
    Sarath, P.
    Rahman, Shibil P.
    Sneha, U.
    Manmadhan, Sruthy
    2022 IEEE 19TH INDIA COUNCIL INTERNATIONAL CONFERENCE, INDICON, 2022,
  • [4] REINFORCEMENT STACKED LEARNING WITH SEMANTIC-ASSOCIATED ATTENTION FOR VISUAL QUESTION ANSWERING
    Xiao, Xinyu
    Zhang, Chunxia
    Xiang, Shiming
    Pan, Chunhong
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4170 - 4174
  • [5] Hierarchical Question-Image Co-Attention for Visual Question Answering
    Lu, Jiasen
    Yang, Jianwei
    Batra, Dhruv
    Parikh, Devi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [6] Multi-level Attention Networks for Visual Question Answering
    Yu, Dongfei
    Fu, Jianlong
    Mei, Tao
    Rui, Yong
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4187 - 4195
  • [7] Enhancing Recurrent Neural Networks with Positional Attention for Question Answering
    Chen, Qin
    Hu, Qinmin
    Huang, Jimmy Xiangji
    He, Liang
    An, Weijie
    SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 993 - 996
  • [8] Regularizing Attention Networks for Anomaly Detection in Visual Question Answering
    Lee, Doyup
    Cheon, Yeongjae
    Han, Wook-Shin
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1845 - 1853
  • [9] Multi-view Attention Networks for Visual Question Answering
    Li, Min
    Bai, Zongwen
    Deng, Jie
    2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 788 - 794
  • [10] BCA: Bilinear Convolutional Neural Networks and Attention Networks for legal question answering
    Zhang, Haiguang
    Zhang, Tongyue
    Cao, Faxin
    Wang, Zhizheng
    Zhang, Yuanyu
    Sun, Yuanyuan
    Vicente, Mark Anthony
    AI OPEN, 2022, 3 : 172 - 181