Learning visual relationship and context-aware attention for image captioning

被引:107
|
作者
Wang, Junbo [1 ,3 ]
Wang, Wei [1 ,3 ]
Wang, Liang [1 ,2 ,3 ]
Wang, Zhiyong [4 ]
Feng, David Dagan [4 ]
Tan, Tieniu [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci CASIA, Inst Automat, Natl Lab Pattern Recognit NLPR, CRIPAC, Beijing, Peoples R China
[2] CASIA, CEBSIT, Beijing, Peoples R China
[3] UCAS, Beijing, Peoples R China
[4] Univ Sydney, Sch Informat Technol, Sydney, NSW, Australia
基金
中国国家自然科学基金; 澳大利亚研究理事会;
关键词
Image captioning; Relational reasoning; Context-aware attention; RECOGNITION;
D O I
10.1016/j.patcog.2019.107075
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning which automatically generates natural language descriptions for images has attracted lots of research attentions and there have been substantial progresses with attention based captioning methods. However, most attention-based image captioning methods focus on extracting visual information in regions of interest for sentence generation and usually ignore the relational reasoning among those regions of interest in an image. Moreover, these methods do not take into account previously attended regions which can be used to guide the subsequent attention selection. In this paper, we propose a novel method to implicitly model the relationship among regions of interest in an image with a graph neural network, as well as a novel context-aware attention mechanism to guide attention selection by fully memorizing previously attended visual content. Compared with the existing attention-based image captioning methods, ours can not only learn relation-aware visual representations for image captioning, but also consider historical context information on previous attention. We perform extensive experiments on two public benchmark datasets: MS COCO and Flickr30K, and the experimental results indicate that our proposed method is able to outperform various state-of-the-art methods in terms of the widely used evaluation metrics. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Textual Context-Aware Dense Captioning With Diverse Words
    Shao, Zhuang
    Han, Jungong
    Debattista, Kurt
    Pang, Yanwei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8753 - 8766
  • [22] Feature-attention module for context-aware image-to-image translation
    Jing Bai
    Ran Chen
    Min Liu
    The Visual Computer, 2020, 36 : 2145 - 2159
  • [23] Feature-attention module for context-aware image-to-image translation
    Bai, Jing
    Chen, Ran
    Liu, Min
    VISUAL COMPUTER, 2020, 36 (10-12): : 2145 - 2159
  • [24] Context-Aware Visual Tracking
    Yang, Ming
    Wu, Ying
    Hua, Gang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (07) : 1195 - 1209
  • [25] Bengali Image Captioning with Visual Attention
    Ami, Amit Saha
    Humaira, Mayeesha
    Jim, Md Abidur Rahman Khan
    Paul, Shimul
    Shah, Faisal Muhammad
    2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,
  • [26] Context-aware Image Compression Optimization for Visual Analytics Offloading
    Chen, Bo
    Yan, Zhisheng
    Nahrstedt, Klara
    PROCEEDINGS OF THE 13TH ACM MULTIMEDIA SYSTEMS CONFERENCE, MMSYS 2022, 2022, : 27 - 38
  • [27] Context-Aware Emotion Recognition Based on Visual Relationship Detection
    Hoang, Manh-Hung
    Kim, Soo-Hyung
    Yang, Hyung-Jeong
    Lee, Guee-Sang
    IEEE ACCESS, 2021, 9 : 90465 - 90474
  • [28] Towards Context-aware Interaction Recognition for Visual Relationship Detection
    Zhuang, Bohan
    Liu, Lingqiao
    Shen, Chunhua
    Reid, Ian
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 589 - 598
  • [29] Exploring region relationships implicitly: Image captioning with visual relationship attention
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    IMAGE AND VISION COMPUTING, 2021, 109
  • [30] Relevant Visual Semantic Context-Aware Attention-Based Dialog
    Hong, Eugene Tan Boon
    Chong, Yung-Wey
    Wan, Tat-Chee
    Yau, Kok-Lim Alvin
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 76 (02): : 2337 - 2354