A cooperative approach based on self-attention with interactive attribute for image caption

被引:10
|
作者
Zhao, Dexin [1 ]
Yang, Ruixue [1 ]
Wang, Zhaohui [1 ]
Qi, Zhiyang [1 ]
机构
[1] Tianjin Univ Technol, Tianjin Key Lab Intelligence Comp & Novel Softwar, Tianjin 300384, Peoples R China
关键词
Image caption; Deep neural network; Self-attention;
D O I
10.1007/s11042-022-13279-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image caption is a challenging issue in the area of image understanding, in which most of the models are trained by the framework combined a deep convolutional neural network with a recurrent neural network. However, the features extracted by the convolutional neural network could capture the information of salient regions, which fails to cover the details in the image. Moreover, the gradient vanishing problem of the recurrent neural networks would cause the loss of the previous information as the time step grows. In this paper, Cooperative Self-Attention (CSA) is proposed address these problems. Comparing with existing methods, our model enhances the representation of the image by fusing the additional attribute information from the object detection. A sub-module named Inter-Attribute indicating the interaction of objects is proposed to strengthen the context of the entities. In virtue of the advantages of Self-Attention, different from previous methods that predict the next word based on one prior word and hidden state, our model concatenates all of the words generated step by step to solve long-term dependencies. Comparing with published state-of-the-art methods, our CSA demonstrates outstanding performance.
引用
收藏
页码:1223 / 1236
页数:14
相关论文
共 50 条
  • [1] A cooperative approach based on self-attention with interactive attribute for image caption
    Dexin Zhao
    Ruixue Yang
    Zhaohui Wang
    Zhiyang Qi
    Multimedia Tools and Applications, 2023, 82 : 1223 - 1236
  • [2] A Novel Cross Channel Self-Attention based Approach for Facial Attribute Editing
    Xu, Meng
    Jin, Rize
    Lu, Liangfu
    Chung, Tae-Sun
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (06) : 2115 - 2127
  • [3] Caption Generation Based on Emotions Using CSPDenseNet and BiLSTM with Self-Attention
    Priya, S. Kavi
    Karthika, Pon K.
    Kaliappan, Jayakumar
    Selvaraj, Senthil Kumaran
    Nagalakshmi, R.
    Molla, Baye
    APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2022, 2022
  • [4] Pedestrian Attribute Recognition Based on Dual Self-attention Mechanism
    Fan, Zhongkui
    Guan, Ye-peng
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2023, 20 (02) : 793 - 812
  • [5] Infrared and Visible Image Fusion Method via Interactive Self-attention
    Yang Fan
    Wang Zhishe
    Sun Jing
    Yu Zhaofa
    ACTA PHOTONICA SINICA, 2024, 53 (06)
  • [6] ISANet: An Interactive Self-attention Network for Cropland Image Change Detection
    Dong, Sijun
    Chen, Yanrui
    Wu, Fann
    Gao, Zhi
    Meng, Xiaoliang
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION, ICCA 2024, 2024, : 862 - 867
  • [7] A Dual Self-Attention based Network for Image Captioning
    Li, ZhiYong
    Yang, JinFu
    Li, YaPing
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 1590 - 1595
  • [8] Local Attribute Attention Network for Minority Clothing Image Caption Generation
    Xuhui Z.
    Li L.
    Xiaodong F.
    Lijun L.
    Wei P.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2024, 36 (03): : 399 - 412
  • [9] Clothes image caption generation with attribute detection and visual attention model
    Li, Xianrui
    Ye, Zhiling
    Zhang, Zhao
    Zhao, Mingbo
    PATTERN RECOGNITION LETTERS, 2021, 141 (141) : 68 - 74
  • [10] Research of Self-Attention in Image Segmentation
    Cao, Fude
    Zheng, Chunguang
    Huang, Limin
    Wang, Aihua
    Zhang, Jiong
    Zhou, Feng
    Ju, Haoxue
    Guo, Haitao
    Du, Yuxia
    JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2022, 15 (01)