Attribute-driven image captioning via soft-switch pointer

被引：8

作者：

Zhou, Yujie ^{[1
,2
]}

Long, Jiefeng ^{[1
,2
]}

Xu, Suping ^{[1
,2
]}

Shang, Lin ^{[1
,2
]}

机构：

[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Peoples R China

[2] Nanjing Univ, Dept Comp Sci & Technol, Nanjing 210023, Peoples R China

来源：

PATTERN RECOGNITION LETTERS | 2021年 / 152卷

基金：

中国国家自然科学基金;

关键词：

Image captioning; Visual attributes detection; Attention; Pointing mechanism;

D O I：

10.1016/j.patrec.2021.08.021

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual attributes detection provides rich semantic concepts for image captioning. Some previous methods attempt to directly encode the attributes into vectors and generate the corresponding captions, which ignore the correlations between the image regions and attributes. In this paper, we consider to bridge the gap between visual features and detected attributes: first to look at a specific region of the image and second to decide which attribute to attend to. We propose an attribute-driven image captioning approach consisting of two parts: the visual positioning part and the attribute selection part. Specifically, we introduce the pointer-generator network into the second part of our model as a soft-switch, which determines whether to generate a word through the hidden state or point to a detected attribute at each decoding step. Qualitative and Quantitative experiments show that our model can improve the coverage of key visual attributes and significantly boost the overall performance. (c) 2021 Published by Elsevier B.V.

引用

页码：34 / 41

页数：8

共 10 条

[1] Show, Observe and Tell: Attribute-driven Attention Model for Image Captioning
Chen, Hui
Ding, Guiguang
Lin, Zijia
Zhao, Sicheng
Hang, Jungong
[J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 606 - 612
[2] Attribute-Driven Filtering: A new attributes predicting approach for fine-grained image captioning
Hossen, Md. Bipul
Ye, Zhongfu
Abdussalam, Amr
Ul Hassan, Shabih
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
[3] Scalable attribute-driven face image retrieval
An, Le
Zou, Changjian
Zhang, Liyan
Denney, Bradley
[J]. NEUROCOMPUTING, 2016, 172 : 215 - 224
[4] Attribute-Driven Spontaneous Motion in Unpaired Image Translation
Wu, Ruizheng
Tao, Xin
Gu, Xiaodong
Shen, Xiaoyong
Jia, Jiaya
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5922 - 5931
[5] Modeling the Momentum Spillover Effect for Stock Prediction via Attribute-Driven Graph Attention Networks
Cheng, Rui
Li, Qing
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 55 - 62
[6] Adma-GAN: Attribute-Driven Memory Augmented GANs for Text-to-Image Generation.
Wu, Xintian
Zhao, Hanbin
Zheng, Liangli
Ding, Shouhong
Li, Xi
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1593 - 1602
[7] Semi-supervised Credit Card Fraud Detection via Attribute-Driven Graph Representation
Xiang, Sheng
Zhu, Mingzhi
Cheng, Dawei
Li, Enxia
Zhao, Ruihui
Ouyang, Yi
Chen, Ling
Zheng, Yefeng
[J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 14557 - 14565
[8] Data-driven image captioning via salient region discovery
Kilickaya, Mert
Akkus, Burak Kerim
Cakici, Ruket
Erdem, Aykut
Erdem, Erkut
Ikizler-Cinbis, Nazli
[J]. IET COMPUTER VISION, 2017, 11 (06) : 398 - 406
[9] KEYWORD-DRIVEN IMAGE CAPTIONING VIA CONTEXT-DEPENDENT BILATERAL LSTM
Zhang, Xiaodan
He, Shengfeng
Song, Xinhang
Wei, Pengxu
Jiang, Shuqiang
Ye, Qixiang
Jiao, Jianbin
Lau, Rynson W. H.
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 781 - 786
[10] Removing mismatches for retinal image registration via multi-attribute-driven regularized mixture model
Wang, Gang
Wang, Zhicheng
Chen, Yufei
Zhou, Qiangqiang
Zhao, Weidong
[J]. INFORMATION SCIENCES, 2016, 372 : 492 - 504

← 1 →