Application of SENet generative adversarial network in image semantics description

被引:0
|
作者
Liu Z. [1 ,3 ]
Chen H. [1 ,3 ]
Hu W. [2 ]
机构
[1] College of Electrical and Information Engineering, Lanzhou University of Technology, Lanzhou
[2] College of Mathematic and Computer Science, Northwest Minzu University, Lanzhou
[3] Key Laboratory of Gansu Advanced Control for Industrial Processes, Lanzhou
关键词
adversarial training; feature extraction; generator model; image semantics description; SENet networks;
D O I
10.37188/OPE.20233109.1379
中图分类号
学科分类号
摘要
An SENet-based method for image semantics description of generative adversarial networks is proposed to address the inaccurate description of utterances and inadequate involvement of emotional colors in image semantics descriptions. The method first adds a channel attention mechanism to the feature extraction stage of the generator model so that the network can completely extract features from salient regions of the image and input the extracted image features into the encoder. Second,a sentiment corpus is added to the original text corpus,and a word vector is generated through natural language processing. This word vector is then combined with the encoded image features and input to the decoder,and a sentiment description statement is generated to match the content depicted in the image through continuous adversarial training. The proposed method is compared with existing methods through simulation experiments,and it is found to improve the BLEU metric by approximately 15% compared with the SentiCap method;improvements in other related metrics are also noted. In self-comparison experiments,the method exhibits an improvement of approximately 3% in the CIDEr metric. Thus,the proposed network can better extract image features,resulting in more accurate statements describing images and richer emotional colors. © 2023 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:1379 / 1389
页数:10
相关论文
共 22 条
  • [1] LI P ZH, WAN X, LI SH Y., Image caption of space science experiment based on multi-modal learning[J], Optics and Precision Engineering, 29, 12, pp. 2944-2955, (2021)
  • [2] ZHAO H Y, ZHOU W,, HOU X G, Et al., Multi-label classification of traditional national costume pattern image semantic understanding[J], Opt. Precision Eng, 28, 3, pp. 695-703, (2020)
  • [3] Bottom-up and top-down attention for image captioning and visual question answering[C], 2018 IEEE/ CVF Conference on Computer Vision and Pattern Recognition, pp. 6077-6086, (2018)
  • [4] ZHOU Z W, WANG CH, XU L., Design and application of image captioning algorithm based on fusion gate neural network[J], Optics and Precision Engineering, 29, 4, pp. 906-915, (2021)
  • [5] GAI R L, CAI J R, WANG SH Y,, Et al., Research review on image recognition based on deep learning [J], Journal of Chinese Computer Systems, 42, 9, pp. 1980-1984, (2021)
  • [6] WANG J, TANG J H, YANG M K, Et al., Improving OCR-based image captioning by incorporating geometrical relationship[C], 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1306-1315, (2021)
  • [7] Show,attend and tell:neural image caption generation with visual attention[C], Proceedings of the 32nd International Conference on International Conference on Machine Learning, 37, pp. 2048-2057, (2015)
  • [8] ORDONEZ V,, Et al., Babytalk:understanding and generating simple image descriptions[J], IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 12, pp. 2891-2903, (2013)
  • [9] DE V A., Describing images using inferred visual dependency representations[C], Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 42-52, (2015)
  • [10] CAMILLERI K P., What is the Role of Recurrent Neural Networks(RNNs)in an Image Caption Generator? [EB/OL], (2017)