Research on image captioning using dilated convolution ResNet and attention mechanism

被引:0
|
作者
机构
[1] [1,Li, Haisheng
[2] Yuan, Rongrong
[3] Li, Qiuyi
[4] Hu, Cong
基金
中国国家自然科学基金;
关键词
D O I
10.1007/s00530-024-01653-w
中图分类号
学科分类号
摘要
Image captioning, which refers to generating a textual description of the image content from a given image, has been recognized as a key problem in visual-to-linguistic tasks. In this work, we introduce dilated convolution to increase the perceptual field, which can better capture an image’s details and contextual information and extract richer image features. A sparse multilayer perceptron is introduced and combined with an attention mechanism to enhance the extraction of detailed features and attention to essential feature regions, thus improving the network’s expressive ability and feature selection. Furthermore, the residual squeeze-and-excitation module is added to help the model better understand the image content, thus improving the accuracy of the image captioning task. However, the main challenge is achieving high accuracy in capturing both local and global image features simultaneously while maintaining model efficiency and reducing computational costs. The experimental results on the Flickr8k and Flickr30k datasets show that our proposed method has improved the generation accuracy and diversity, which can better capture image features and improve the accuracy of generated captions. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2025.
引用
收藏
相关论文
共 50 条
  • [1] AttResNet: Attention-based ResNet for Image Captioning
    Feng, Yunmeng
    Lan, Long
    Zhang, Xiang
    Xu, Chuanfu
    Wang, Zhenghua
    Luo, Zhigang
    [J]. 2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018), 2018,
  • [2] Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention
    Chu, Yan
    Yue, Xiao
    Yu, Lei
    Sergei, Mikhailov
    Wang, Zhengkui
    [J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2020, 2020
  • [3] Hybrid Dilated Convolution with Attention Mechanisms for Image Denoising
    Bian, Shengqin
    He, Xinyu
    Xu, Zhengguang
    Zhang, Lixin
    [J]. ELECTRONICS, 2023, 12 (18)
  • [4] Image Segmentation with Pyramid Dilated Convolution Based on ResNet and U-Net
    Zhang, Qiao
    Cui, Zhipeng
    Niu, Xiaoguang
    Geng, Shijie
    Qiao, Yu
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT II, 2017, 10635 : 364 - 372
  • [5] Depthwise Separable Convolution ResNet with attention mechanism for Alzheimer's detection
    Kadri, Rahma
    Bouaziz, Bassem
    Tmar, Mohamed
    Gargouri, Faiez
    [J]. 2022 INTERNATIONAL CONFERENCE ON TECHNOLOGY INNOVATIONS FOR HEALTHCARE, ICTIH, 2022, : 47 - 52
  • [6] An ensemble model with attention based mechanism for image captioning
    Al Badarneh, Israa
    Hammo, Bassam H.
    Al-Kadi, Omar
    [J]. Computers and Electrical Engineering, 2025, 123
  • [7] Attention on Attention for Image Captioning
    Huang, Lun
    Wang, Wenmin
    Chen, Jie
    Wei, Xiao-Yong
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4633 - 4642
  • [8] An Image Captioning Algorithm Based on Combination Attention Mechanism
    Liu, Jinlong
    Cheng, Kangda
    Jin, Haiyan
    Wu, Zhilu
    [J]. ELECTRONICS, 2022, 11 (09)
  • [9] Reference Based on Adaptive Attention Mechanism for Image Captioning
    Liu, Shuang
    Bai, Liang
    Guo, Yanming
    Wang, Haoran
    [J]. 2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [10] A novel ResNet101 model based on dense dilated convolution for image classification
    Qi Zhang
    [J]. SN Applied Sciences, 2022, 4