FFGS: Feature Fusion with Gating Structure for Image Caption Generation

被引:5
|
作者
Yuan, Aihong [1 ,2 ]
Li, Xuelong [1 ]
Lu, Xiaoqiang [1 ]
机构
[1] Chinese Acad Sci, Xian Inst Opt & Precis Mech, Ctr OPT IMagery Anal & Learning OPTIMAL, Xian 710119, Shaanxi, Peoples R China
[2] Univ Chinese Acad Sci, 19A Yuquanlu, Beijing 100049, Peoples R China
来源
COMPUTER VISION, PT I | 2017年 / 771卷
关键词
Image caption generation; Recurrent neural network; Convolutional neural network; Multi-modal embedding; Feature fusion;
D O I
10.1007/978-981-10-7299-4_53
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatically generating a natural language to describe the content of the given image is a challenging task in the interdisciplinary between computer vision and natural language processing. The task is challenging because computers not only need to recognize objects, their attributions and relationships between them in an image, but also these elements should be represented into a natural language sentence. This paper proposed a feature fusion with gating structure for image caption generation. First, the pre-trained VGG-19 is used as the image feature extractor. We use the FC-7 and CONV5-4 layer's outputs as the global and local image feature, respectively. Second, the image features and the corresponding sentence are imported into LSTM to learn their relationship. The global image feature is gated at each time-step before imported into LSTM while the local image feature used the attention model. Experimental results show our method outperform the state-of-the-art methods.
引用
收藏
页码:638 / 649
页数:12
相关论文
共 50 条
  • [1] Boosting image caption generation with feature fusion module
    Pengfei Xia
    Jingsong He
    Jin Yin
    Multimedia Tools and Applications, 2020, 79 : 24225 - 24239
  • [2] Boosting image caption generation with feature fusion module
    Xia, Pengfei
    He, Jingsong
    Yin, Jin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (33-34) : 24225 - 24239
  • [3] Image-Caption Model Based on Fusion Feature
    Geng, Yaogang
    Mei, Hongyan
    Xue, Xiaorong
    Zhang, Xing
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [4] Image Caption Automatic Generation Method Based on Weighted Feature
    Xi, Su Mei
    Cho, Young Im
    2013 13TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2013), 2013, : 548 - 551
  • [5] 3G structure for image caption generation
    Yuan, Aihong
    Li, Xuelong
    Lu, Xiaoqiang
    NEUROCOMPUTING, 2019, 330 : 17 - 28
  • [6] Neural Image Caption Generation with Global Feature Based Attention Scheme
    Wang, Yongzhuang
    Xiong, Hongkai
    IMAGE AND GRAPHICS (ICIG 2017), PT II, 2017, 10667 : 51 - 61
  • [7] A NOVEL SEMANTIC ATTRIBUTE-BASED FEATURE FOR IMAGE CAPTION GENERATION
    Wang, Wei
    Ding, Yuxuan
    Tian, Chunna
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 3081 - 3085
  • [8] TVPRNN for image caption generation
    Yang, Liang
    Hu, Haifeng
    ELECTRONICS LETTERS, 2017, 53 (22) : 1471 - +
  • [9] CNN image caption generation
    Li Y.
    Cheng H.
    Liang X.
    Guo Q.
    Qian Y.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (02): : 152 - 157
  • [10] A PARALL-FUSION RNN-LSTM ARCHITECTURE FOR IMAGE CAPTION GENERATION
    Wang, Minsi
    Song, Li
    Yang, Xiaokang
    Luo, Chuanfei
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 4448 - 4452