Image Caption via Visual Attention Switch on DenseNet

被引:0
|
作者
Hao, Yanlong [1 ]
Xie, Jiyang [1 ]
Lin, Zhiqing [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Pattern Recognit & Intelligent Syst Lab, Beijing 100876, Peoples R China
关键词
Image caption; Visual attention switch; Encoder-decoder architecture; DenseNet; LSTM;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We introduce a novel approach that is used to convert images into the corresponding language descriptions. This method follows the most popular encoder-decoder architecture. The encoder uses the recently proposed densely convolutional neural network (DenseNet) to extract the feature maps. Meanwhile, the decoder uses the long short time memory (LSTM) to parse the feature maps to descriptions. We predict the next word of descriptions by taking the effective combination of feature maps with word embedding of current input word by "visual attention switch". Finally, we compare the performance of the proposed model with other baseline models and achieve good results.
引用
收藏
页码:334 / 338
页数:5
相关论文
共 50 条
  • [1] Chinese Image Caption Generation via Visual Attention and Topic Modeling
    Liu, Maofu
    Hu, Huijun
    Li, Lingjun
    Yu, Yan
    Guan, Weili
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1247 - 1257
  • [2] Image caption based on Visual Attention Mechanism
    Zhou, Jinfei
    Zhu, Yaping
    Pan, Hong
    [J]. PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO AND SIGNAL PROCESSING (IVSP 2019), 2019, : 28 - 32
  • [3] Image Caption Generation with Hierarchical Contextual Visual Spatial Attention
    Khademi, Mahmoud
    Schulte, Oliver
    [J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 2024 - 2032
  • [4] Clothes image caption generation with attribute detection and visual attention model
    Li, Xianrui
    Ye, Zhiling
    Zhang, Zhao
    Zhao, Mingbo
    [J]. PATTERN RECOGNITION LETTERS, 2021, 141 : 68 - 74
  • [5] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
    Xu, Kelvin
    Ba, Jimmy Lei
    Kiros, Ryan
    Cho, Kyunghyun
    Courville, Aaron
    Salakhutdinov, Ruslan
    Zemel, Richard S.
    Bengio, Yoshua
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 2048 - 2057
  • [6] GVA: guided visual attention approach for automatic image caption generation
    Hossen, Md. Bipul
    Ye, Zhongfu
    Abdussalam, Amr
    Hossain, Md. Imran
    [J]. MULTIMEDIA SYSTEMS, 2024, 30 (01)
  • [7] GVA: guided visual attention approach for automatic image caption generation
    Md. Bipul Hossen
    Zhongfu Ye
    Amr Abdussalam
    Md. Imran Hossain
    [J]. Multimedia Systems, 2024, 30
  • [8] Cross-Lingual Image Caption Generation Based on Visual Attention Model
    Wang, Bin
    Wang, Cungang
    Zhang, Qian
    Su, Ying
    Wang, Yang
    Xu, Yanyan
    [J]. IEEE ACCESS, 2020, 8 : 104543 - 104554
  • [9] Image Caption with Endogenous–Exogenous Attention
    Teng Wang
    Haifeng Hu
    Chen He
    [J]. Neural Processing Letters, 2019, 50 : 431 - 443
  • [10] Visual Relational Reasoning for Image Caption
    Pei, Haolei
    Chen, Qiaohong
    Wang, Ji
    Sun, Qi
    Jia, Yubo
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,