A cooperative approach based on self-attention with interactive attribute for image caption

被引:10
|
作者
Zhao, Dexin [1 ]
Yang, Ruixue [1 ]
Wang, Zhaohui [1 ]
Qi, Zhiyang [1 ]
机构
[1] Tianjin Univ Technol, Tianjin Key Lab Intelligence Comp & Novel Softwar, Tianjin 300384, Peoples R China
关键词
Image caption; Deep neural network; Self-attention;
D O I
10.1007/s11042-022-13279-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image caption is a challenging issue in the area of image understanding, in which most of the models are trained by the framework combined a deep convolutional neural network with a recurrent neural network. However, the features extracted by the convolutional neural network could capture the information of salient regions, which fails to cover the details in the image. Moreover, the gradient vanishing problem of the recurrent neural networks would cause the loss of the previous information as the time step grows. In this paper, Cooperative Self-Attention (CSA) is proposed address these problems. Comparing with existing methods, our model enhances the representation of the image by fusing the additional attribute information from the object detection. A sub-module named Inter-Attribute indicating the interaction of objects is proposed to strengthen the context of the entities. In virtue of the advantages of Self-Attention, different from previous methods that predict the next word based on one prior word and hidden state, our model concatenates all of the words generated step by step to solve long-term dependencies. Comparing with published state-of-the-art methods, our CSA demonstrates outstanding performance.
引用
收藏
页码:1223 / 1236
页数:14
相关论文
共 50 条
  • [21] A Window-Based Self-Attention approach for sentence encoding
    Huang, Ting
    Deng, Zhi-Hong
    Shen, Gehui
    Chen, Xi
    NEUROCOMPUTING, 2020, 375 : 25 - 31
  • [22] Variational joint self-attention for image captioning
    Shao, Xiangjun
    Xiang, Zhenglong
    Li, Yuanxiang
    Zhang, Mingjie
    IET IMAGE PROCESSING, 2022, 16 (08) : 2075 - 2086
  • [23] Sparse self-attention transformer for image inpainting
    Huang, Wenli
    Deng, Ye
    Hui, Siqi
    Wu, Yang
    Zhou, Sanping
    Wang, Jinjun
    PATTERN RECOGNITION, 2024, 145
  • [24] Relation constraint self-attention for image captioning
    Ji, Junzhong
    Wang, Mingzhan
    Zhang, Xiaodan
    Lei, Minglong
    Qu, Liangqiong
    NEUROCOMPUTING, 2022, 501 : 778 - 789
  • [25] HIGSA: Human image generation with self-attention
    Wu, Haoran
    He, Fazhi
    Si, Tongzhen
    Duan, Yansong
    Yan, Xiaohu
    ADVANCED ENGINEERING INFORMATICS, 2023, 55
  • [26] CycleGAN Clinical Image Augmentation Based on Mask Self-Attention Mechanism
    Liu, Junzhuo
    Wang, Zhixiang
    Zhang, Ye
    Traverso, Alberto
    Dekker, Andre
    Zhang, Zhen
    Chen, Qiaosong
    IEEE ACCESS, 2022, 10 : 105942 - 105953
  • [27] Bahdanau Attention Based Bengali Image Caption Generation
    Alam, Md Sahrial
    Rahman, Md Sayedur
    Hosen, Md Ikbal
    Mubin, Khairul Anam
    Hossen, Sharif
    Mridha, M. F.
    2022 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATIONS (DASA), 2022, : 1073 - 1077
  • [28] Image deblurring method based on self-attention and residual wavelet transform
    Zhang, Bing
    Sun, Jing
    Sun, Fuming
    Wang, Fasheng
    Zhu, Bing
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244
  • [29] Image super-resolution reconstruction based on self-attention GAN
    Wang X.-S.
    Chao J.
    Cheng Y.-H.
    Kongzhi yu Juece/Control and Decision, 2021, 36 (06): : 1324 - 1332
  • [30] Research for image caption based on global attention mechanism
    Tong, Wu
    Tao, Ku
    Hao, Zhang
    SECOND TARGET RECOGNITION AND ARTIFICIAL INTELLIGENCE SUMMIT FORUM, 2020, 11427