A SEQUENTIAL GUIDING NETWORK WITH ATTENTION FOR IMAGE CAPTIONING

被引:0
|
作者
Sow, Daouda [1 ]
Qin, Zengchang [1 ,2 ]
Niasse, Mouhamed [3 ]
Wan, Tao [1 ]
机构
[1] Beihang Univ, Sch ASEE, Intelligent Comp & Machine Learning Lab, Beijing, Peoples R China
[2] Keep Inc, Keep Labs, Beijing, Peoples R China
[3] North China Elect Power Univ, Sch EEE, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The recent advances of deep learning in both computer vision (CV) and natural language processing (NLP) provide us a new way of understanding semantics, by which we can deal with more challenging tasks such as automatic description generation from natural images. In this challenge, the encoder-decoder framework has achieved promising performance when a convolutional neural network (CNN) is used as image encoder and a recurrent neural network (RNN) as decoder. In this paper, we introduce a sequential guiding network that guides the decoder during word generation. The new model is an extension of the encoder-decoder framework with attention that has an additional guiding long short-term memory (LSTM) and can be trained in an end-to-end manner by using image/descriptions pairs. We validate our approach by conducting extensive experiments on a benchmark dataset, i.e., MS COCO Captions. The proposed model achieves significant improvement comparing to the other state-of-the-art deep learning models.
引用
收藏
页码:3802 / 3806
页数:5
相关论文
共 50 条
  • [1] Image Captioning with Affective Guiding and Selective Attention
    Wang, Anqi
    Hu, Haifeng
    Yang, Liang
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (03)
  • [2] REFINING ATTENTION: A SEQUENTIAL ATTENTION MODEL FOR IMAGE CAPTIONING
    Fang, Fang
    Li, Qinyu
    Wang, Hanli
    Tang, Pengjie
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
  • [3] Hierarchical Attention Network for Image Captioning
    Wang, Weixuan
    Chen, Zhihong
    Hu, Haifeng
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8957 - 8964
  • [4] Hybrid attention network for image captioning
    Jiang, Wenhui
    Li, Qin
    Zhan, Kun
    Fang, Yuming
    Shen, Fei
    [J]. DISPLAYS, 2022, 73
  • [5] Multivariate Attention Network for Image Captioning
    Wang, Weixuan
    Chen, Zhihong
    Hu, Haifeng
    [J]. COMPUTER VISION - ACCV 2018, PT VI, 2019, 11366 : 587 - 602
  • [6] Guiding Attention using Partial-Order Relationships for Image Captioning
    Popattia, Murad
    Rafi, Muhammad
    Qureshi, Rizwan
    Nawaz, Shah
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4670 - 4679
  • [7] Sequential Transformer via an Outside-In Attention for image captioning
    Wei, Yiwei
    Wu, Chunlei
    Li, Guohe
    Shi, Haitao
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 108
  • [8] Attention on Attention for Image Captioning
    Huang, Lun
    Wang, Wenmin
    Chen, Jie
    Wei, Xiao-Yong
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4633 - 4642
  • [9] Image captioning using DenseNet network and adaptive attention
    Deng, Zhenrong
    Jiang, Zhouqin
    Lan, Rushi
    Huang, Wenming
    Luo, Xiaonan
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 85
  • [10] Multi-Keys Attention Network for Image Captioning
    Yang, Ziqian
    Li, Hui
    Ouyang, Renrong
    Zhang, Quan
    Xiao, Jimin
    [J]. COGNITIVE COMPUTATION, 2024, 16 (03) : 1061 - 1072