Image Caption Generation with Part of Speech Guidance

被引:47
|
作者
He, Xinwei [1 ]
Shi, Baoguang [1 ]
Bai, Xiang [1 ]
Xia, Gui-Song [2 ]
Zhang, Zhaoxiang [3 ]
Dong, Weisheng [4 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Hubei, Peoples R China
[2] Wuhan Univ, State Key Lab, LIESMARS, Wuhan 430079, Hubei, Peoples R China
[3] Chinese Acad Sci, Inst Automat, CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
[4] Xidian Univ, Sch Elect Engn, Xian 710071, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Image caption generation; Part-of-speech tags; Long Short-Term Memory; Visual attributes;
D O I
10.1016/j.patrec.2017.10.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a fundamental problem in image understanding, image caption generation has attracted much attention from both computer vision and natural language processing communities. In this paper, we focus on how to exploit the structure information of a natural sentence, which is used to describe the content of an image. We discover that the Part of Speech (PoS) tags of a sentence, are very effective cues for guiding the Long Short-Term Memory (LSTM) based word generator. More specifically, given a sentence, the PoS tag of each word is utilized to determine whether it is essential to input image representation into the word generator. Benefiting from such a strategy, our model can closely connect the visual attributes of an image to the word concepts in the natural language space. Experimental results on the most popular benchmark datasets, e.g., Flickr30k and MS COCO, consistently demonstrate that our method can significantly enhance the performance of a standard image caption generation model, and achieve the conpetitive results. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:229 / 237
页数:9
相关论文
共 50 条
  • [41] Cascade recurrent neural network for image caption generation
    Wu, Jie
    Hu, Haifeng
    [J]. ELECTRONICS LETTERS, 2017, 53 (25) : 1642 - 1643
  • [42] Visual Image Caption Generation for Service Robotics and Industrial Applications
    Luo, Ren C.
    Hsu, Yu-Ting
    Wen, Yu-Cheng
    Ye, Huan-Jun
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER PHYSICAL SYSTEMS (ICPS 2019), 2019, : 827 - 832
  • [43] Image Caption Automatic Generation Method Based on Weighted Feature
    Xi, Su Mei
    Cho, Young Im
    [J]. 2013 13TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2013), 2013, : 548 - 551
  • [44] Automatic Image Caption Generation Using ResNet & Torch Vision
    Verma, Vijeta
    Saritha, Sri Khetwat
    Jain, Sweta
    [J]. MACHINE LEARNING, IMAGE PROCESSING, NETWORK SECURITY AND DATA SCIENCES, MIND 2022, PT II, 2022, 1763 : 82 - 101
  • [45] Scene Attention Mechanism for Remote Sensing Image Caption Generation
    Wu, Shiqi
    Zhang, Xiangrong
    Wang, Xin
    Li, Chen
    Jiao, Licheng
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [46] A Hindi Image Caption Generation Framework Using Deep Learning
    Mishra, Santosh Kumar
    Dhir, Rijul
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (02)
  • [47] Image Caption Generation via Unified Retrieval and Generation-Based Method
    Zhao, Shanshan
    Li, Lixiang
    Peng, Haipeng
    Yang, Zihang
    Zhang, Jiaxuan
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (18):
  • [48] Learning cross-modality features for image caption generation
    Zeng, Chao
    Kwong, Sam
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (07) : 2059 - 2070
  • [49] FFGS: Feature Fusion with Gating Structure for Image Caption Generation
    Yuan, Aihong
    Li, Xuelong
    Lu, Xiaoqiang
    [J]. COMPUTER VISION, PT I, 2017, 771 : 638 - 649
  • [50] Image caption generation method based on adaptive attention mechanism
    Jin, Huazhong
    Wu, Yu
    Wan, Fang
    Hu, Man
    Li, Qingqing
    [J]. MIPPR 2019: PATTERN RECOGNITION AND COMPUTER VISION, 2020, 11430