Fine-grained attention for image caption generation

被引:0
|
作者
Yan-Shuo Chang
机构
[1] China(Xi’an) Institute for Silk Road Research,School of Information
[2] Xi’an University of Finance and Economics,undefined
来源
关键词
Fine-grained attention; Image caption generation; Attention generation;
D O I
暂无
中图分类号
学科分类号
摘要
Despite the progress, generating natural language descriptions for images is still a challenging task. Most state-of-the-art methods for solving this problem apply existing deep convolutional neural network (CNN) models to extract a visual representation of the entire image, based on which the parallel structures between images and sentences are exploited using recurrent neural networks. However, there is an inherent drawback that their models may attend to a partial view of a visual element or a conglomeration of several concepts. In this paper, we present a fine-grained attention based model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation. The model contains three sub-networks: a deep recurrent neural network for sentences, a deep convolutional network for images, and a region proposal network for nearly cost-free region proposals. Our model is able to automatically learn to fix its gaze on salient region proposals. The process of generating the next word, given the previously generated ones, is aligned with this visual perception experience. We validate the effectiveness of the proposed model on three benchmark datasets (Flickr 8K, Flickr 30K and MS COCO). The experimental results confirm the effectiveness of the proposed system.
引用
收藏
页码:2959 / 2971
页数:12
相关论文
共 50 条
  • [41] Aggregating Object Features Based on Attention Weights for Fine-Grained Image Retrieval
    Lin, HongLi
    Song, Yongqi
    Zeng, Zixuan
    Wang, Weisheng
    Wang, Jiayi
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2838 - 2844
  • [42] Beyond the Attention: Distinguish the Discriminative and Confusable Features For Fine-grained Image Classification
    Shi, Xiruo
    Xu, Liutong
    Wang, Pengfei
    Gao, Yuanyuan
    Jian, Haifang
    Liu, Wu
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 601 - 609
  • [43] A Survey of Fine-Grained Image Categorization
    Zheng, Min
    Li, Qingyong
    Geng, Yangli-ao
    Yu, Haomin
    Wang, Jianzhu
    Gan, Jinrui
    Xue, Wenyuan
    [J]. PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 533 - 538
  • [44] Channel Attention Multi-Branch Network for Fine-Grained Image Recognition
    Wang Binzhou
    Xiao Zhiyong
    [J]. LASER & OPTOELECTRONICS PROGRESS, 2021, 58 (22)
  • [45] A Multi-part Convolutional Attention Network for Fine-Grained Image Recognition
    Zhong, Weilin
    Jiang, Linfeng
    Zhang, Tao
    Ji, Jinsheng
    Xiong, Huilin
    [J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 1857 - 1862
  • [46] Lifelong Fine-Grained Image Retrieval
    Chen, Wei
    Xu, Haoyang
    Pu, Nan
    Liu, Yu
    Lao, Mingrui
    Wang, Weiping
    Liu, Li
    Lew, Michael S.
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7533 - 7544
  • [47] A feature consistency driven attention erasing network for fine-grained image retrieval
    Zhao, Qi
    Wang, Xu
    Lyu, Shuchang
    Liu, Binghao
    Yang, Yifan
    [J]. PATTERN RECOGNITION, 2022, 128
  • [48] Fine-Grained Features for Image Captioning
    Shao, Mengyue
    Feng, Jie
    Wu, Jie
    Zhang, Haixiang
    Zheng, Yayu
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 4697 - 4712
  • [49] Hierarchical Attention Network for Open-Set Fine-Grained Image Recognition
    Sun, Jiayin
    Wang, Hong
    Dong, Qiulei
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3891 - 3904
  • [50] Towards Fine-Grained Concept Generation
    Li, Chenguang
    Liang, Jiaqing
    Xiao, Yanghua
    Jiang, Haiyun
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (01) : 986 - 997