Image Caption Generation with Part of Speech Guidance

被引:47
|
作者
He, Xinwei [1 ]
Shi, Baoguang [1 ]
Bai, Xiang [1 ]
Xia, Gui-Song [2 ]
Zhang, Zhaoxiang [3 ]
Dong, Weisheng [4 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Hubei, Peoples R China
[2] Wuhan Univ, State Key Lab, LIESMARS, Wuhan 430079, Hubei, Peoples R China
[3] Chinese Acad Sci, Inst Automat, CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
[4] Xidian Univ, Sch Elect Engn, Xian 710071, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Image caption generation; Part-of-speech tags; Long Short-Term Memory; Visual attributes;
D O I
10.1016/j.patrec.2017.10.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a fundamental problem in image understanding, image caption generation has attracted much attention from both computer vision and natural language processing communities. In this paper, we focus on how to exploit the structure information of a natural sentence, which is used to describe the content of an image. We discover that the Part of Speech (PoS) tags of a sentence, are very effective cues for guiding the Long Short-Term Memory (LSTM) based word generator. More specifically, given a sentence, the PoS tag of each word is utilized to determine whether it is essential to input image representation into the word generator. Benefiting from such a strategy, our model can closely connect the visual attributes of an image to the word concepts in the natural language space. Experimental results on the most popular benchmark datasets, e.g., Flickr30k and MS COCO, consistently demonstrate that our method can significantly enhance the performance of a standard image caption generation model, and achieve the conpetitive results. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:229 / 237
页数:9
相关论文
共 50 条
  • [1] The Accurate Guidance for Image Caption Generation
    Qi, Xinyuan
    Cao, Zhiguo
    Xiao, Yang
    Wang, Jian
    Zhang, Chao
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PT III, 2018, 11258 : 15 - 26
  • [2] Integrating Part of Speech Guidance for Image Captioning
    Zhang, Ji
    Mei, Kuizhi
    Zheng, Yu
    Fan, Jianping
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 92 - 104
  • [3] TVPRNN for image caption generation
    Yang, Liang
    Hu, Haifeng
    [J]. ELECTRONICS LETTERS, 2017, 53 (22) : 1471 - +
  • [4] CNN image caption generation
    Li, Yong
    Cheng, Honghong
    Liang, Xinyan
    Guo, Qian
    Qian, Yuhua
    [J]. Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (02): : 152 - 157
  • [5] Reducing Unknown Unknowns with Guidance in Image Caption
    Ni, Mengjun
    Yang, Jing
    Lin, Xin
    He, Liang
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, PT II, 2017, 10614 : 547 - 555
  • [6] Image Caption Generation With Adaptive Transformer
    Zhang, Wei
    Nie, Wenbo
    Li, Xinle
    Yu, Yao
    [J]. 2019 34RD YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION (YAC), 2019, : 521 - 526
  • [7] An Overview of Image Caption Generation Methods
    Wang, Haoran
    Zhang, Yue
    Yu, Xiaosheng
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2020, 2020
  • [8] A survey on automatic image caption generation
    Bai, Shuang
    An, Shan
    [J]. NEUROCOMPUTING, 2018, 311 : 291 - 304
  • [9] Image caption generation with dual attention mechanism
    Liu, Maofu
    Li, Lingjun
    Hu, Huijun
    Guan, Weili
    Tian, Jing
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (02)
  • [10] Image Caption Generation Using A Deep Architecture
    Hani, Ansar
    Tagougui, Najiba
    Kherallah, Monji
    [J]. 2019 INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2019, : 246 - 251