Learn and Tell: Learning Priors for Image Caption Generation

被引:1
|
作者
Liu, Pei [1 ,2 ,5 ]
Peng, Dezhong [1 ,3 ]
Zhang, Ming [4 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
[3] Shenzhen Peng Cheng Lab, Shenzhen 518052, Peoples R China
[4] Nanjing Univ Aeronaut & Astronaut, Coll Econ & Management, Nanjing 211106, Peoples R China
[5] Dept Elect & Comp Engn, 968 Ctr Dr, Gainesville, FL 32611 USA
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 19期
基金
中国国家自然科学基金;
关键词
image captioning; image understanding; probability-being-mentioned prior; part-of-speech prior; LANGUAGE;
D O I
10.3390/app10196942
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In this work, we propose a novel priors-based attention neural network (PANN) for image captioning, which aims at incorporating two kinds of priors, i.e., the probabilities being mentioned for local region proposals (PBM priors) and part-of-speech clues for caption words (POS priors), into a visual information extraction process at each word prediction. This work was inspired by the intuitions that region proposals have different inherent probabilities for image captioning, and that the POS clues bridge the word class (part-of-speech tag) with the categories of visual features. We propose new methods to extract these two priors, in which the PBM priors are obtained by computing the similarities between the caption feature vector and local feature vectors, while the POS priors are predicated at each step of word generation by taking the hidden state of the decoder as input. After that, these two kinds of priors are further incorporated into the PANN module of the decoder to help the decoder extract more accurate visual information for the current word generation. In our experiments, we qualitatively analyzed the proposed approach and quantitatively evaluated several captioning schemes with our PANN on the MS-COCO dataset. Experimental results demonstrate that our proposed method could achieve better performance as well as the effectiveness of the proposed network for image captioning.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [1] Tell and guess: cooperative learning for natural image caption generation with hierarchical refined attention
    Zhang, Wenqiao
    Tang, Siliang
    Su, Jiajie
    Xiao, Jun
    Zhuang, Yueting
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 16267 - 16282
  • [2] Tell and guess: cooperative learning for natural image caption generation with hierarchical refined attention
    Wenqiao Zhang
    Siliang Tang
    Jiajie Su
    Jun Xiao
    Yueting Zhuang
    [J]. Multimedia Tools and Applications, 2021, 80 : 16267 - 16282
  • [3] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
    Xu, Kelvin
    Ba, Jimmy Lei
    Kiros, Ryan
    Cho, Kyunghyun
    Courville, Aaron
    Salakhutdinov, Ruslan
    Zemel, Richard S.
    Bengio, Yoshua
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 2048 - 2057
  • [4] Automatic image caption generation using deep learning
    Verma, Akash
    Yadav, Arun Kumar
    Kumar, Mohit
    Yadav, Divakar
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 5309 - 5325
  • [5] Image Caption Generation using Deep Learning Technique
    Amritkar, Chetan
    Jabade, Vaishali
    [J]. 2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
  • [6] Automatic image caption generation using deep learning
    Akash Verma
    Arun Kumar Yadav
    Mohit Kumar
    Divakar Yadav
    [J]. Multimedia Tools and Applications, 2024, 83 : 5309 - 5325
  • [7] Show, tell and rectify: Boost image caption generation via an output rectifier
    Ge, Guowei
    Han, Yufeng
    Hao, Lingguang
    Hao, Kuangrong
    Wei, Bing
    Tang, Xue-song
    [J]. NEUROCOMPUTING, 2024, 585
  • [8] Show and Tell: A Neural Image Caption Generator
    Vinyals, Oriol
    Toshev, Alexander
    Bengio, Samy
    Erhan, Dumitru
    [J]. 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 3156 - 3164
  • [9] A Hindi Image Caption Generation Framework Using Deep Learning
    Mishra, Santosh Kumar
    Dhir, Rijul
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (02)
  • [10] Learning cross-modality features for image caption generation
    Zeng, Chao
    Kwong, Sam
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (07) : 2059 - 2070