AttResNet: Attention-based ResNet for Image Captioning

被引:0
|
作者
Feng, Yunmeng [1 ]
Lan, Long [2 ,3 ]
Zhang, Xiang [2 ,3 ]
Xu, Chuanfu [2 ,3 ]
Wang, Zhenghua [1 ]
Luo, Zhigang [1 ]
机构
[1] Natl Univ Def Technol, Sci & Technol Parallel & Distributed Lab, Coll Comp, Changsha, Hunan, Peoples R China
[2] Natl Univ Def Technol, Coll Comp, Inst Quantum Informat, Changsha, Hunan, Peoples R China
[3] Natl Univ Def Technol, Coll Comp, State Key Lab High Performance Comp, Changsha, Hunan, Peoples R China
关键词
Image caption; ResNet; Attention based mechanism;
D O I
10.1145/3302425.3302464
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image caption has been widely studied recently which, likes the human to understand a scene, learns the high-level semantic descriptions for a single image. To achieve this goal, many recent methods divide the task into two stages, namely, the encoder and decoder, respectively corresponding to feature extraction and semantic descriptions. With the development of the deep neural network, two stages can be realized with a convolutional neural networks (CNNs) followed by a recurrent neural networks (RNNs). Following the novel idea of such deep encoder-decoder framework, this paper mainly refines the encoder with an attention-based ResNet model to provide better semantic features for the decoder. Attention mechanism has been broadly recognized to be a useful strategy in image captioning, which highlights the image regions of interest and further emphasizes the corresponding semantics to enhance the captioning of image content. Specifically, we design an attention connection and then seamlessly couple it with the well-known ResNet. Thus, we call it AttResNet. To our best knowledge, this is the first attempt to apply ResNet for image captioning. Experiments on MSCOCO dataset validate our proposed model achieves favorable results.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] A New Attention-Based LSTM for Image Captioning
    Fen Xiao
    Wenfeng Xue
    Yanqing Shen
    Xieping Gao
    [J]. Neural Processing Letters, 2022, 54 : 3157 - 3171
  • [2] A Survey on Attention-Based Models for Image Captioning
    Osman, Asmaa A. E.
    Shalaby, Mohamed A. Wahby
    Soliman, Mona M.
    Elsayed, Khaled M.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (02) : 403 - 412
  • [3] A New Attention-Based LSTM for Image Captioning
    Xiao, Fen
    Xue, Wenfeng
    Shen, Yanqing
    Gao, Xieping
    [J]. NEURAL PROCESSING LETTERS, 2022, 54 (04) : 3157 - 3171
  • [4] A Visual Attention-Based Model for Bengali Image Captioning
    Das B.
    Pal R.
    Majumder M.
    Phadikar S.
    Sekh A.A.
    [J]. SN Computer Science, 4 (2)
  • [5] Attention-Based Image Captioning Using DenseNet Features
    Hossain, Md Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    Bennamoun, Mohammed
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 109 - 117
  • [6] Cross modification attention-based deliberation model for image captioning
    Lian, Zheng
    Zhang, Yanan
    Li, Haichang
    Wang, Rui
    Hu, Xiaohui
    [J]. APPLIED INTELLIGENCE, 2023, 53 (05) : 5910 - 5933
  • [7] Auxiliary feature extractor and dual attention-based image captioning
    Qian Zhao
    Guichang Wu
    [J]. Signal, Image and Video Processing, 2024, 18 : 3615 - 3626
  • [8] Cross modification attention-based deliberation model for image captioning
    Zheng Lian
    Yanan Zhang
    Haichang Li
    Rui Wang
    Xiaohui Hu
    [J]. Applied Intelligence, 2023, 53 : 5910 - 5933
  • [9] Auxiliary feature extractor and dual attention-based image captioning
    Zhao, Qian
    Wu, Guichang
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (04) : 3615 - 3626
  • [10] A Hierarchical Multimodal Attention-based Neural Network for Image Captioning
    Cheng, Yong
    Huang, Fei
    Zhou, Lian
    Jin, Cheng
    Zhang, Yuejie
    Zhang, Tao
    [J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 889 - 892