c-RNN: A Fine-Grained Language Model for Image Captioning

被引:9
|
作者
Huang, Gengshi [1 ]
Hu, Haifeng [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Engn, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Character-level; Convolutional Neural Network; Recurrent Neural Network; Sequence learning;
D O I
10.1007/s11063-018-9836-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Captioning methods from predecessors that based on the conventional deep Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) architecture follow translation system using word-level modelling. But an optimal word segmentation algorithm is essential for segmenting sentence into words in word-level modelling, which is a very difficult task. In this paper, we built a character-level RNN (c-RNN) that directly modeled on captions with characterization where descriptive sentence is composed in a flow of characters. The c-RNN performs language task in finer level and naturally avoids the word segmentation issue. Our c-RNN empowered the language model to dynamically reason about word spelling as well as grammatical rules which results in expressive and elaborate sentence. We optimized parameters of neural nets by maximizing the probabilities of correctly generated characterized sentences. Quantitative and qualitative experiments on the most popular datasets MSCOCO and Flickr30k showed that our c-RNN could describe images with a considerably faster speed and satisfactory quality.
引用
收藏
页码:683 / 691
页数:9
相关论文
共 50 条
  • [1] c-RNN: A Fine-Grained Language Model for Image Captioning
    Gengshi Huang
    Haifeng Hu
    [J]. Neural Processing Letters, 2019, 49 : 683 - 691
  • [2] Fine-Grained Features for Image Captioning
    Shao, Mengyue
    Feng, Jie
    Wu, Jie
    Zhang, Haixiang
    Zheng, Yayu
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 4697 - 4712
  • [3] FineFormer: Fine-Grained Adaptive Object Transformer for Image Captioning
    Wang, Bo
    Zhang, Zhao
    Fan, Jicong
    Zhao, Mingbo
    Zhan, Choujun
    Xu, Mingliang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 508 - 517
  • [4] Fine-grained and Semantic-guided Visual Attention for Image Captioning
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    [J]. 2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1709 - 1717
  • [5] Fine-Grained Image Captioning With Global-Local Discriminative Objective
    Wu, Jie
    Chen, Tianshui
    Wu, Hefeng
    Yang, Zhi
    Luo, Guangchun
    Lin, Liang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2413 - 2427
  • [6] Fine-grained image emotion captioning based on Generative Adversarial Networks
    Yang, Chunmiao
    Wang, Yang
    Han, Liying
    Jia, Xiran
    Sun, Hebin
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (34) : 81857 - 81875
  • [7] Fine-grained Video Captioning for Sports Narrative
    Yu, Huanyu
    Cheng, Shuo
    Ni, Bingbing
    Wang, Minsi
    Zhang, Jian
    Yang, Xiaokang
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6006 - 6015
  • [8] ICEAP: An advanced fine-grained image captioning network with enhanced attribute predictor
    Hossen, Md. Bipul
    Ye, Zhongfu
    Abdussalam, Amr
    Hossain, Mohammad Alamgir
    [J]. DISPLAYS, 2024, 84
  • [9] A Fine-Grained Spatial-Temporal Attention Model for Video Captioning
    Liu, An-An
    Qiu, Yurui
    Wong, Yongkang
    Su, Yu-Ting
    Kankanhalli, Mohan
    [J]. IEEE ACCESS, 2018, 6 : 68463 - 68471
  • [10] REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning
    Jiang, Ming
    Hu, Junjie
    Huang, Qiuyuan
    Zhang, Lei
    Diesner, Jana
    Gao, Jianfeng
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1475 - 1480