Fine-grained Image Caption based on Multi-level Attention

被引:0
|
作者
Yang Zhenyu [1 ]
Zhang Jiao [1 ]
机构
[1] Qilu Univ Technol, Shandong Acad Sci, 3501 Univ Rd Changqing Dist, Jinan 250353, Shandong, Peoples R China
关键词
image caption; multi-level attention; feature extraction; text generation;
D O I
10.1109/iccsnt47585.2019.8962488
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the continuous development of image caption, the effects of it are getting better. However, there are still problems with the image caption. The complete description should contain different forms of information which include fine-grained information and labels. In order to cope with the above challenges, we propose a fine-grained image caption based on multi-level attention method which can complete label prediction and description of fine-grained images. (1)We firstly use the visual attention mechanism to fuse features of the global and local fine-grained features. (2)The joint attention mechanism is then used to fuse the visual feature and the label features of the image to generate a text description for a specific region of the image. (3)Finally, we use the attention-based Long Short Term Memory (LSTM), a language generation model, to generate fine-grained image caption.
引用
收藏
页码:72 / 78
页数:7
相关论文
共 50 条
  • [1] Fine-grained attention for image caption generation
    Chang, Yan-Shuo
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (03) : 2959 - 2971
  • [2] Fine-grained attention for image caption generation
    Yan-Shuo Chang
    [J]. Multimedia Tools and Applications, 2018, 77 : 2959 - 2971
  • [3] Multi-Level Region Matching for Fine-Grained Sketch-Based Image Retrieval
    Ling, Zhixin
    Xing, Zhen
    Li, Jiangtong
    Niu, Li
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [4] Multi-level network based on transformer encoder for fine-grained image–text matching
    Lei Yang
    Yong Feng
    Mingliang Zhou
    Xiancai Xiong
    Yongheng Wang
    Baohua Qiang
    [J]. Multimedia Systems, 2023, 29 : 1981 - 1994
  • [5] Multi-level dictionary learning for fine-grained images categorization with attention model
    Ji, Jinsheng
    Guo, Yiyou
    Yang, Zhen
    Zhang, Tao
    Lu, Xiankai
    [J]. NEUROCOMPUTING, 2021, 453 : 403 - 412
  • [6] Multi-level network based on transformer encoder for fine-grained image-text matching
    Yang, Lei
    Feng, Yong
    Zhou, Mingliang
    Xiong, Xiancai
    Wang, Yongheng
    Qiang, Baohua
    [J]. MULTIMEDIA SYSTEMS, 2023, 29 (04) : 1981 - 1994
  • [7] Multi-Level Fine-Grained Interactions for Collaborative Filtering
    Feng, Xingjie
    Zeng, Yunze
    [J]. IEEE ACCESS, 2019, 7 : 143169 - 143184
  • [8] Attribute-Guided Multi-Level Attention Network for Fine-Grained Fashion Retrieval
    Xiao, Ling
    Yamasaki, Toshihiko
    [J]. IEEE ACCESS, 2024, 12 (48068-48080) : 48068 - 48080
  • [9] From coarse to fine: multi-level feature fusion network for fine-grained image retrieval
    Wang, Shijie
    Wang, Zhihui
    Wang, Ning
    Wang, Hong
    Li, Haojie
    [J]. MULTIMEDIA SYSTEMS, 2022, 28 (04) : 1515 - 1528
  • [10] From coarse to fine: multi-level feature fusion network for fine-grained image retrieval
    Shijie Wang
    Zhihui Wang
    Ning Wang
    Hong Wang
    Haojie Li
    [J]. Multimedia Systems, 2022, 28 : 1515 - 1528