FFGS: Feature Fusion with Gating Structure for Image Caption Generation

被引：5

作者：

Yuan, Aihong ^{[1
,2
]}

Li, Xuelong ^{[1
]}

Lu, Xiaoqiang ^{[1
]}

机构：

[1] Chinese Acad Sci, Xian Inst Opt & Precis Mech, Ctr OPT IMagery Anal & Learning OPTIMAL, Xian 710119, Shaanxi, Peoples R China

[2] Univ Chinese Acad Sci, 19A Yuquanlu, Beijing 100049, Peoples R China

来源：

COMPUTER VISION, PT I | 2017年 / 771卷

关键词：

Image caption generation; Recurrent neural network; Convolutional neural network; Multi-modal embedding; Feature fusion;

D O I：

10.1007/978-981-10-7299-4_53

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatically generating a natural language to describe the content of the given image is a challenging task in the interdisciplinary between computer vision and natural language processing. The task is challenging because computers not only need to recognize objects, their attributions and relationships between them in an image, but also these elements should be represented into a natural language sentence. This paper proposed a feature fusion with gating structure for image caption generation. First, the pre-trained VGG-19 is used as the image feature extractor. We use the FC-7 and CONV5-4 layer's outputs as the global and local image feature, respectively. Second, the image features and the corresponding sentence are imported into LSTM to learn their relationship. The global image feature is gated at each time-step before imported into LSTM while the local image feature used the attention model. Experimental results show our method outperform the state-of-the-art methods.

引用

页码：638 / 649

页数：12

共 50 条

[1] Boosting image caption generation with feature fusion module
Pengfei Xia
Jingsong He
Jin Yin
Multimedia Tools and Applications, 2020, 79 : 24225 - 24239
[2] Boosting image caption generation with feature fusion module
Xia, Pengfei
He, Jingsong
Yin, Jin
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (33-34) : 24225 - 24239
[3] Image-Caption Model Based on Fusion Feature
Geng, Yaogang
Mei, Hongyan
Xue, Xiaorong
Zhang, Xing
APPLIED SCIENCES-BASEL, 2022, 12 (19):
[4] Image Caption Automatic Generation Method Based on Weighted Feature
Xi, Su Mei
Cho, Young Im
2013 13TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2013), 2013, : 548 - 551
[5] 3G structure for image caption generation
Yuan, Aihong
Li, Xuelong
Lu, Xiaoqiang
NEUROCOMPUTING, 2019, 330 : 17 - 28
[6] Neural Image Caption Generation with Global Feature Based Attention Scheme
Wang, Yongzhuang
Xiong, Hongkai
IMAGE AND GRAPHICS (ICIG 2017), PT II, 2017, 10667 : 51 - 61
[7] A NOVEL SEMANTIC ATTRIBUTE-BASED FEATURE FOR IMAGE CAPTION GENERATION
Wang, Wei
Ding, Yuxuan
Tian, Chunna
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 3081 - 3085
[8] TVPRNN for image caption generation
Yang, Liang
Hu, Haifeng
ELECTRONICS LETTERS, 2017, 53 (22) : 1471 - +
[9] CNN image caption generation
Li Y.
Cheng H.
Liang X.
Guo Q.
Qian Y.
Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (02): : 152 - 157
[10] A PARALL-FUSION RNN-LSTM ARCHITECTURE FOR IMAGE CAPTION GENERATION
Wang, Minsi
Song, Li
Yang, Xiaokang
Luo, Chuanfei
2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 4448 - 4452

← 1 2 3 4 5 →