Learning visual relationship and context-aware attention for image captioning

被引：107

作者：

Wang, Junbo ^{[1
,3
]}

Wang, Wei ^{[1
,3
]}

Wang, Liang ^{[1
,2
,3
]}

Wang, Zhiyong ^{[4
]}

Feng, David Dagan ^{[4
]}

Tan, Tieniu ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci CASIA, Inst Automat, Natl Lab Pattern Recognit NLPR, CRIPAC, Beijing, Peoples R China

[2] CASIA, CEBSIT, Beijing, Peoples R China

[3] UCAS, Beijing, Peoples R China

[4] Univ Sydney, Sch Informat Technol, Sydney, NSW, Australia

来源：

PATTERN RECOGNITION | 2020年 / 98卷

基金：

中国国家自然科学基金; 澳大利亚研究理事会;

关键词：

Image captioning; Relational reasoning; Context-aware attention; RECOGNITION;

D O I：

10.1016/j.patcog.2019.107075

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image captioning which automatically generates natural language descriptions for images has attracted lots of research attentions and there have been substantial progresses with attention based captioning methods. However, most attention-based image captioning methods focus on extracting visual information in regions of interest for sentence generation and usually ignore the relational reasoning among those regions of interest in an image. Moreover, these methods do not take into account previously attended regions which can be used to guide the subsequent attention selection. In this paper, we propose a novel method to implicitly model the relationship among regions of interest in an image with a graph neural network, as well as a novel context-aware attention mechanism to guide attention selection by fully memorizing previously attended visual content. Compared with the existing attention-based image captioning methods, ours can not only learn relation-aware visual representations for image captioning, but also consider historical context information on previous attention. We perform extensive experiments on two public benchmark datasets: MS COCO and Flickr30K, and the experimental results indicate that our proposed method is able to outperform various state-of-the-art methods in terms of the widely used evaluation metrics. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页数：11

共 50 条

[31] Context-Aware Image Compression
Chan, Jacky C. K.
Mahjoubfar, Ata
Chen, Claire L.
Jalali, Bahram
PLOS ONE, 2016, 11 (07):
[32] Enhancing image caption generation through context-aware attention mechanism
Bhuiyan, Ahatesham
Hossain, Eftekhar
Hoque, Mohammed Moshiul
Dewan, M. Ali Akber
HELIYON, 2024, 10 (17)
[33] Learning Cascaded Context-aware Framework for Robust Visual Tracking
Ma, Ding
Wu, Xiangqian
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 28 - 36
[34] Image Recommendation for Informal Vocabulary Learning in a Context-aware Learning Environment
Hasnine, Mohammad Nehal
Mouri, Kousuke
Flanagan, Brendan
Akcapinar, Gokhan
Uosaki, Noriko
Ogata, Hiroaki
26TH INTERNATIONAL CONFERENCE ON COMPUTERS IN EDUCATION (ICCE 2018), 2018, : 669 - 674
[35] Learning Context-aware Latent Representations for Context-aware Collaborative Filtering
Liu, Xin
Wu, Wei
SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 887 - 890
[36] Context-aware mutual learning for blind image inpainting and beyond
Zhao, Haoru
Wang, Yufeng
Gu, Zhaorui
Zheng, Bing
Zheng, Haiyong
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 268
[37] Hierarchical Context-aware Network for Dense Video Event Captioning
Ji, Lei
Guo, Xianglin
Huang, Haoyang
Chen, Xilin
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2004 - 2013
[38] Context-aware joint dictionary learning for color image demosaicking
Hua, Kai-Lung
Hidayati, Shintami Chusnul
He, Fang-Lin
Wei, Chia-Po
Wang, Yu-Chiang Frank
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2016, 38 : 230 - 245
[39] Context-Assisted Attention for Image Captioning
Lian, Zheng
Wang, Rui
Li, Haichang
Hu, Xiaohui
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT I, 2022, 13529 : 722 - 733
[40] Context-Aware Mobile Learning
Economides, Anastasios A.
OPEN KNOWLEDGE SOCIETY: A COMPUTER SCIENCE AND INFORMATION SYSTEMS MANIFESTO, 2008, 19 : 213 - 220

← 1 2 3 4 5 →