Learning visual relationship and context-aware attention for image captioning

被引:107
|
作者
Wang, Junbo [1 ,3 ]
Wang, Wei [1 ,3 ]
Wang, Liang [1 ,2 ,3 ]
Wang, Zhiyong [4 ]
Feng, David Dagan [4 ]
Tan, Tieniu [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci CASIA, Inst Automat, Natl Lab Pattern Recognit NLPR, CRIPAC, Beijing, Peoples R China
[2] CASIA, CEBSIT, Beijing, Peoples R China
[3] UCAS, Beijing, Peoples R China
[4] Univ Sydney, Sch Informat Technol, Sydney, NSW, Australia
基金
中国国家自然科学基金; 澳大利亚研究理事会;
关键词
Image captioning; Relational reasoning; Context-aware attention; RECOGNITION;
D O I
10.1016/j.patcog.2019.107075
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning which automatically generates natural language descriptions for images has attracted lots of research attentions and there have been substantial progresses with attention based captioning methods. However, most attention-based image captioning methods focus on extracting visual information in regions of interest for sentence generation and usually ignore the relational reasoning among those regions of interest in an image. Moreover, these methods do not take into account previously attended regions which can be used to guide the subsequent attention selection. In this paper, we propose a novel method to implicitly model the relationship among regions of interest in an image with a graph neural network, as well as a novel context-aware attention mechanism to guide attention selection by fully memorizing previously attended visual content. Compared with the existing attention-based image captioning methods, ours can not only learn relation-aware visual representations for image captioning, but also consider historical context information on previous attention. We perform extensive experiments on two public benchmark datasets: MS COCO and Flickr30K, and the experimental results indicate that our proposed method is able to outperform various state-of-the-art methods in terms of the widely used evaluation metrics. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Context-aware transformer for image captioning
    Yang, Xin
    Wang, Ying
    Chen, Haishun
    Li, Jie
    Huang, Tingting
    NEUROCOMPUTING, 2023, 549
  • [2] Image Captioning with Context-Aware Auxiliary Guidance
    Song, Zeliang
    Zhou, Xiaofei
    Mao, Zhendong
    Tan, Jianlong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2584 - 2592
  • [3] Context-aware and co-attention network based image captioning model
    Sharma, Himanshu
    Srivastava, Swati
    IMAGING SCIENCE JOURNAL, 2023, 71 (03): : 244 - 256
  • [4] Context-Aware Visual Policy Network for Sequence-Level Image Captioning
    Liu, Daqing
    Zha, Zheng-Jun
    Zhang, Hanwang
    Zhang, Yongdong
    Wu, Feng
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1416 - 1424
  • [5] Context-Aware Visual Policy Network for Fine-Grained Image Captioning
    Zha, Zheng-Jun
    Liu, Daqing
    Zhang, Hanwang
    Zhang, Yongdong
    Wu, Feng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (02) : 710 - 722
  • [6] Visual Relationship Attention for Image Captioning
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [7] Meshed Context-Aware Beam Search for Image Captioning
    Zhao, Fengzhi
    Yu, Zhezhou
    Wang, Tao
    Zhao, He
    ENTROPY, 2024, 26 (10)
  • [8] Stacked Multimodal Attention Network for Context-Aware Video Captioning
    Zheng, Yi
    Zhang, Yuejie
    Feng, Rui
    Zhang, Tao
    Fan, Weiguo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 31 - 42
  • [9] Context-aware attention network for image recognition
    Jiaxu Leng
    Ying Liu
    Shang Chen
    Neural Computing and Applications, 2019, 31 : 9295 - 9305
  • [10] Context-aware attention network for image recognition
    Leng, Jiaxu
    Liu, Ying
    Chen, Shang
    NEURAL COMPUTING & APPLICATIONS, 2019, 31 (12): : 9295 - 9305