Fine-grained attention for image caption generation

被引:0
|
作者
Yan-Shuo Chang
机构
[1] China(Xi’an) Institute for Silk Road Research,School of Information
[2] Xi’an University of Finance and Economics,undefined
来源
关键词
Fine-grained attention; Image caption generation; Attention generation;
D O I
暂无
中图分类号
学科分类号
摘要
Despite the progress, generating natural language descriptions for images is still a challenging task. Most state-of-the-art methods for solving this problem apply existing deep convolutional neural network (CNN) models to extract a visual representation of the entire image, based on which the parallel structures between images and sentences are exploited using recurrent neural networks. However, there is an inherent drawback that their models may attend to a partial view of a visual element or a conglomeration of several concepts. In this paper, we present a fine-grained attention based model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation. The model contains three sub-networks: a deep recurrent neural network for sentences, a deep convolutional network for images, and a region proposal network for nearly cost-free region proposals. Our model is able to automatically learn to fix its gaze on salient region proposals. The process of generating the next word, given the previously generated ones, is aligned with this visual perception experience. We validate the effectiveness of the proposed model on three benchmark datasets (Flickr 8K, Flickr 30K and MS COCO). The experimental results confirm the effectiveness of the proposed system.
引用
收藏
页码:2959 / 2971
页数:12
相关论文
共 50 条
  • [1] Fine-grained attention for image caption generation
    Chang, Yan-Shuo
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (03) : 2959 - 2971
  • [2] Fine-grained Image Caption based on Multi-level Attention
    Yang Zhenyu
    Zhang Jiao
    [J]. PROCEEDINGS OF 2019 IEEE 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2019), 2019, : 72 - 78
  • [3] Fine-Grained Image Quality Caption With Hierarchical Semantics Degradation
    Yang, Wen
    Wu, Jinjian
    Tian, Shiwei
    Li, Leida
    Dong, Weisheng
    Shi, Guangming
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3578 - 3590
  • [4] Text-to-Image Generation Grounded by Fine-Grained User Attention
    Koh, Jing Yu
    Baldridge, Jason
    Lee, Honglak
    Yang, Yinfei
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 237 - 246
  • [5] Learning Cascade Attention for fine-grained image classification
    Zhu, Youxiang
    Li, Ruochen
    Yang, Yin
    Ye, Ning
    [J]. NEURAL NETWORKS, 2020, 122 : 174 - 182
  • [6] Adversarial erasing attention for fine-grained image classification
    Ji, Jinsheng
    Jiang, Linfeng
    Zhang, Tao
    Zhong, Weilin
    Xiong, Huilin
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (15) : 22867 - 22889
  • [7] Aggregate attention module for fine-grained image classification
    Xingmei Wang
    Jiahao Shi
    Hamido Fujita
    Yilin Zhao
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14 : 8335 - 8345
  • [8] Adversarial erasing attention for fine-grained image classification
    Jinsheng Ji
    Linfeng Jiang
    Tao Zhang
    Weilin Zhong
    Huilin Xiong
    [J]. Multimedia Tools and Applications, 2021, 80 : 22867 - 22889
  • [9] Aggregate attention module for fine-grained image classification
    Wang, Xingmei
    Shi, Jiahao
    Fujita, Hamido
    Zhao, Yilin
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (7) : 8335 - 8345
  • [10] Subtler mixed attention network on fine-grained image classification
    Liu, Chao
    Huang, Lei
    Wei, Zhiqiang
    Zhang, Wenfeng
    [J]. APPLIED INTELLIGENCE, 2021, 51 (11) : 7903 - 7916