Learning visual relationship and context-aware attention for image captioning

被引：107

作者：

Wang, Junbo ^{[1
,3
]}

Wang, Wei ^{[1
,3
]}

Wang, Liang ^{[1
,2
,3
]}

Wang, Zhiyong ^{[4
]}

Feng, David Dagan ^{[4
]}

Tan, Tieniu ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci CASIA, Inst Automat, Natl Lab Pattern Recognit NLPR, CRIPAC, Beijing, Peoples R China

[2] CASIA, CEBSIT, Beijing, Peoples R China

[3] UCAS, Beijing, Peoples R China

[4] Univ Sydney, Sch Informat Technol, Sydney, NSW, Australia

来源：

PATTERN RECOGNITION | 2020年 / 98卷

基金：

中国国家自然科学基金; 澳大利亚研究理事会;

关键词：

Image captioning; Relational reasoning; Context-aware attention; RECOGNITION;

D O I：

10.1016/j.patcog.2019.107075

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image captioning which automatically generates natural language descriptions for images has attracted lots of research attentions and there have been substantial progresses with attention based captioning methods. However, most attention-based image captioning methods focus on extracting visual information in regions of interest for sentence generation and usually ignore the relational reasoning among those regions of interest in an image. Moreover, these methods do not take into account previously attended regions which can be used to guide the subsequent attention selection. In this paper, we propose a novel method to implicitly model the relationship among regions of interest in an image with a graph neural network, as well as a novel context-aware attention mechanism to guide attention selection by fully memorizing previously attended visual content. Compared with the existing attention-based image captioning methods, ours can not only learn relation-aware visual representations for image captioning, but also consider historical context information on previous attention. We perform extensive experiments on two public benchmark datasets: MS COCO and Flickr30K, and the experimental results indicate that our proposed method is able to outperform various state-of-the-art methods in terms of the widely used evaluation metrics. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页数：11

共 50 条

[21] Textual Context-Aware Dense Captioning With Diverse Words
Shao, Zhuang
Han, Jungong
Debattista, Kurt
Pang, Yanwei
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8753 - 8766
[22] Feature-attention module for context-aware image-to-image translation
Jing Bai
Ran Chen
Min Liu
The Visual Computer, 2020, 36 : 2145 - 2159
[23] Feature-attention module for context-aware image-to-image translation
Bai, Jing
Chen, Ran
Liu, Min
VISUAL COMPUTER, 2020, 36 (10-12): : 2145 - 2159
[24] Context-Aware Visual Tracking
Yang, Ming
Wu, Ying
Hua, Gang
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (07) : 1195 - 1209
[25] Bengali Image Captioning with Visual Attention
Ami, Amit Saha
Humaira, Mayeesha
Jim, Md Abidur Rahman Khan
Paul, Shimul
Shah, Faisal Muhammad
2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,
[26] Context-aware Image Compression Optimization for Visual Analytics Offloading
Chen, Bo
Yan, Zhisheng
Nahrstedt, Klara
PROCEEDINGS OF THE 13TH ACM MULTIMEDIA SYSTEMS CONFERENCE, MMSYS 2022, 2022, : 27 - 38
[27] Context-Aware Emotion Recognition Based on Visual Relationship Detection
Hoang, Manh-Hung
Kim, Soo-Hyung
Yang, Hyung-Jeong
Lee, Guee-Sang
IEEE ACCESS, 2021, 9 : 90465 - 90474
[28] Towards Context-aware Interaction Recognition for Visual Relationship Detection
Zhuang, Bohan
Liu, Lingqiao
Shen, Chunhua
Reid, Ian
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 589 - 598
[29] Exploring region relationships implicitly: Image captioning with visual relationship attention
Zhang, Zongjian
Wu, Qiang
Wang, Yang
Chen, Fang
IMAGE AND VISION COMPUTING, 2021, 109
[30] Relevant Visual Semantic Context-Aware Attention-Based Dialog
Hong, Eugene Tan Boon
Chong, Yung-Wey
Wan, Tat-Chee
Yau, Kok-Lim Alvin
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 76 (02): : 2337 - 2354

← 1 2 3 4 5 →