IMAGE-TO-TREE: A TREE-STRUCTURED DECODER FOR IMAGE CAPTIONING

被引:2
|
作者
Ma, Zhiming [1 ,2 ]
Yuan, Chun [1 ,2 ]
Cheng, Yangyang [1 ,2 ]
Zhu, Xinrui [1 ,2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
[2] Tsinghua Univ, Grad Sch Shenzhen, Beijing, Peoples R China
关键词
Image captioning; Dependency tree; Tree-structured decoder;
D O I
10.1109/ICME.2019.00225
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automatically generating natural language descriptions of images is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In recent years tremendous success has been shown in image captioning under the encoder-decoder framework, in which decoders are often chain-structured with Recurrent Neural Networks(RNNs), treating sentences as sequences. However, natural sentences are not inherently linear structures, but hierarchical structures. In this paper, we for the first time proposed a model with tree-structured decoder for image captioning(Image-to-Tree), which does not directly generate sentences but instead explicitly generates their dependency trees in a top-down manner. Inspired by the success of attention mechanism in image captioning, we also proposed a corresponding attention-based model for Image-to-Tree. Experiments on MSCOCO dataset demonstrate that our model can achieve comparable results to chain-structured models of different language metrics.
引用
收藏
页码:1294 / 1299
页数:6
相关论文
共 50 条
  • [1] Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
    Zeng, Pengpeng
    Zhu, Jinkuan
    Song, Jingkuan
    Gao, Lianli
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5210 - 5218
  • [2] Image compression by tree-structured segmentation
    Asli, AZ
    Rajaei, A
    [J]. IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY, 1998, 22 (03): : 381 - 388
  • [3] Image compression with learnt tree-structured dictionaries
    Monaci, G
    Jost, P
    Vandergheynst, P
    [J]. 2004 IEEE 6TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2004, : 35 - 38
  • [4] Image segmentation by tree-structured Markov random fields
    Poggi, G
    Ragozini, ARP
    [J]. IEEE SIGNAL PROCESSING LETTERS, 1999, 6 (07) : 155 - 157
  • [5] IRREGULAR TREE-STRUCTURED BAYESIAN NETWORK FOR IMAGE SEGMENTATION
    Kampa, Kittipat
    Putthividhya, Duangmanee
    Principe, Jose C.
    [J]. 2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
  • [6] Varying complexity in tree-structured image distribution models
    Spence, C
    Parra, LC
    Sajda, P
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, 15 (02) : 319 - 330
  • [7] IMAGE-CODING BY ADAPTIVE TREE-STRUCTURED SEGMENTATION
    WU, XL
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1992, 38 (06) : 1755 - 1767
  • [8] Learning Tree-structured Descriptor Quantizers for Image Categorization
    Krapac, Josip
    Verbeek, Jakob
    Jurie, Frederic
    [J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,
  • [9] Tree-Structured CRF Models for Interactive Image Labeling
    Mensink, Thomas
    Verbeek, Jakob
    Csurka, Gabriela
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (02) : 476 - 489
  • [10] Synthesis of VLSI architectures for tree-structured image coding
    Park, N
    Bae, J
    Prasanna, VK
    [J]. INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, PROCEEDINGS - VOL II, 1996, : 999 - 1002