Learning visual relationship and context-aware attention for image captioning

被引:107
|
作者
Wang, Junbo [1 ,3 ]
Wang, Wei [1 ,3 ]
Wang, Liang [1 ,2 ,3 ]
Wang, Zhiyong [4 ]
Feng, David Dagan [4 ]
Tan, Tieniu [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci CASIA, Inst Automat, Natl Lab Pattern Recognit NLPR, CRIPAC, Beijing, Peoples R China
[2] CASIA, CEBSIT, Beijing, Peoples R China
[3] UCAS, Beijing, Peoples R China
[4] Univ Sydney, Sch Informat Technol, Sydney, NSW, Australia
基金
中国国家自然科学基金; 澳大利亚研究理事会;
关键词
Image captioning; Relational reasoning; Context-aware attention; RECOGNITION;
D O I
10.1016/j.patcog.2019.107075
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning which automatically generates natural language descriptions for images has attracted lots of research attentions and there have been substantial progresses with attention based captioning methods. However, most attention-based image captioning methods focus on extracting visual information in regions of interest for sentence generation and usually ignore the relational reasoning among those regions of interest in an image. Moreover, these methods do not take into account previously attended regions which can be used to guide the subsequent attention selection. In this paper, we propose a novel method to implicitly model the relationship among regions of interest in an image with a graph neural network, as well as a novel context-aware attention mechanism to guide attention selection by fully memorizing previously attended visual content. Compared with the existing attention-based image captioning methods, ours can not only learn relation-aware visual representations for image captioning, but also consider historical context information on previous attention. We perform extensive experiments on two public benchmark datasets: MS COCO and Flickr30K, and the experimental results indicate that our proposed method is able to outperform various state-of-the-art methods in terms of the widely used evaluation metrics. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Context-Aware Image Compression
    Chan, Jacky C. K.
    Mahjoubfar, Ata
    Chen, Claire L.
    Jalali, Bahram
    PLOS ONE, 2016, 11 (07):
  • [32] Enhancing image caption generation through context-aware attention mechanism
    Bhuiyan, Ahatesham
    Hossain, Eftekhar
    Hoque, Mohammed Moshiul
    Dewan, M. Ali Akber
    HELIYON, 2024, 10 (17)
  • [33] Learning Cascaded Context-aware Framework for Robust Visual Tracking
    Ma, Ding
    Wu, Xiangqian
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 28 - 36
  • [34] Image Recommendation for Informal Vocabulary Learning in a Context-aware Learning Environment
    Hasnine, Mohammad Nehal
    Mouri, Kousuke
    Flanagan, Brendan
    Akcapinar, Gokhan
    Uosaki, Noriko
    Ogata, Hiroaki
    26TH INTERNATIONAL CONFERENCE ON COMPUTERS IN EDUCATION (ICCE 2018), 2018, : 669 - 674
  • [35] Learning Context-aware Latent Representations for Context-aware Collaborative Filtering
    Liu, Xin
    Wu, Wei
    SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 887 - 890
  • [36] Context-aware mutual learning for blind image inpainting and beyond
    Zhao, Haoru
    Wang, Yufeng
    Gu, Zhaorui
    Zheng, Bing
    Zheng, Haiyong
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 268
  • [37] Hierarchical Context-aware Network for Dense Video Event Captioning
    Ji, Lei
    Guo, Xianglin
    Huang, Haoyang
    Chen, Xilin
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2004 - 2013
  • [38] Context-aware joint dictionary learning for color image demosaicking
    Hua, Kai-Lung
    Hidayati, Shintami Chusnul
    He, Fang-Lin
    Wei, Chia-Po
    Wang, Yu-Chiang Frank
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2016, 38 : 230 - 245
  • [39] Context-Assisted Attention for Image Captioning
    Lian, Zheng
    Wang, Rui
    Li, Haichang
    Hu, Xiaohui
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT I, 2022, 13529 : 722 - 733
  • [40] Context-Aware Mobile Learning
    Economides, Anastasios A.
    OPEN KNOWLEDGE SOCIETY: A COMPUTER SCIENCE AND INFORMATION SYSTEMS MANIFESTO, 2008, 19 : 213 - 220