Image Caption Generation with Part of Speech Guidance

被引:47
|
作者
He, Xinwei [1 ]
Shi, Baoguang [1 ]
Bai, Xiang [1 ]
Xia, Gui-Song [2 ]
Zhang, Zhaoxiang [3 ]
Dong, Weisheng [4 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Hubei, Peoples R China
[2] Wuhan Univ, State Key Lab, LIESMARS, Wuhan 430079, Hubei, Peoples R China
[3] Chinese Acad Sci, Inst Automat, CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
[4] Xidian Univ, Sch Elect Engn, Xian 710071, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Image caption generation; Part-of-speech tags; Long Short-Term Memory; Visual attributes;
D O I
10.1016/j.patrec.2017.10.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a fundamental problem in image understanding, image caption generation has attracted much attention from both computer vision and natural language processing communities. In this paper, we focus on how to exploit the structure information of a natural sentence, which is used to describe the content of an image. We discover that the Part of Speech (PoS) tags of a sentence, are very effective cues for guiding the Long Short-Term Memory (LSTM) based word generator. More specifically, given a sentence, the PoS tag of each word is utilized to determine whether it is essential to input image representation into the word generator. Benefiting from such a strategy, our model can closely connect the visual attributes of an image to the word concepts in the natural language space. Experimental results on the most popular benchmark datasets, e.g., Flickr30k and MS COCO, consistently demonstrate that our method can significantly enhance the performance of a standard image caption generation model, and achieve the conpetitive results. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:229 / 237
页数:9
相关论文
共 50 条
  • [31] Modeling coverage with semantic embedding for image caption generation
    Jiang, Teng
    Zhang, Zehan
    Yang, Yupu
    [J]. VISUAL COMPUTER, 2019, 35 (11): : 1655 - 1665
  • [32] Neural Image Caption Generation with Weighted Training and Reference
    Ding, Guiguang
    Chen, Minghai
    Zhao, Sicheng
    Chen, Hui
    Han, Jungong
    Liu, Qiang
    [J]. COGNITIVE COMPUTATION, 2019, 11 (06) : 763 - 777
  • [33] Fine-grained attention for image caption generation
    Yan-Shuo Chang
    [J]. Multimedia Tools and Applications, 2018, 77 : 2959 - 2971
  • [34] A transformer-based Urdu image caption generation
    Muhammad Hadi
    Iqra Safder
    Hajra Waheed
    Farooq Zaman
    Naif Radi Aljohani
    Raheel Nawaz
    Saeed Ul Hassan
    Raheem Sarwar
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (9) : 3441 - 3457
  • [35] Modeling coverage with semantic embedding for image caption generation
    Teng Jiang
    Zehan Zhang
    Yupu Yang
    [J]. The Visual Computer, 2019, 35 : 1655 - 1665
  • [36] Boosting image caption generation with feature fusion module
    Xia, Pengfei
    He, Jingsong
    Yin, Jin
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (33-34) : 24225 - 24239
  • [37] Learn and Tell: Learning Priors for Image Caption Generation
    Liu, Pei
    Peng, Dezhong
    Zhang, Ming
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 17
  • [38] Automatic image caption generation using deep learning
    Akash Verma
    Arun Kumar Yadav
    Mohit Kumar
    Divakar Yadav
    [J]. Multimedia Tools and Applications, 2024, 83 : 5309 - 5325
  • [39] Image caption generation using a dual attention mechanism
    Padate, Roshni
    Jain, Amit
    Kalla, Mukesh
    Sharma, Arvind
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [40] Image difference caption generation with text information assistance
    Chen, Weijing
    Wang, Weiying
    Jin, Qin
    [J]. Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2022, 48 (08): : 1436 - 1444