Retrieval of Sentence Sequences for an Image Stream via Coherence Recurrent Convolutional Networks

被引:22
|
作者
Park, Cesc Chunseong [1 ]
Kim, Youngjin [1 ]
Kim, Gunhee [1 ]
机构
[1] Seoul Natl Univ, Dept Comp Sci & Engn, Seoul 151742, South Korea
基金
新加坡国家研究基金会;
关键词
Image captioning; bidirectional long short-term memory networks; convolutional neural networks; coherence models;
D O I
10.1109/TPAMI.2017.2700381
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an approach for retrieving a sequence of natural sentences for an image stream. Since general users often take a series of pictures on their experiences, much online visual information exists in the form of image streams, for which it would better take into consideration of the whole image stream to produce natural language descriptions. While almost all previous studies have dealt with the relation between a single image and a single natural sentence, our work extends both input and output dimension to a sequence of images and a sequence of sentences. For retrieving a coherent flow of multiple sentences for a photo stream, we propose a multimodal neural architecture called coherence recurrent convolutional network (CRCN), which consists of convolutional neural networks, bidirectional long short-term memory (LSTM) networks, and an entity-based local coherence model. Our approach directly learns from vast user-generated resource of blog posts as text-image parallel training data. We collect more than 22 K unique blog posts with 170 K associated images for the travel topics of NYC, Disneyland, Australia, and Hawaii. We demonstrate that our approach outperforms other state-of-the-art image captioning methods for text sequence generation, using both quantitative measures and user studies via Amazon Mechanical Turk.
引用
收藏
页码:945 / 957
页数:13
相关论文
共 50 条
  • [1] CLASSIFICATION OF SEVERELY OCCLUDED IMAGE SEQUENCES VIA CONVOLUTIONAL RECURRENT NEURAL NETWORKS
    Zheng, Jian
    Wang, Yifan
    Zhang, Xiaonan
    Li, Xiaohua
    [J]. 2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 395 - 399
  • [2] Recurrent networks with attention and convolutional networks for sentence representation and classification
    Liu, Tengfei
    Yu, Shuangyuan
    Xu, Baomin
    Yin, Hongfeng
    [J]. APPLIED INTELLIGENCE, 2018, 48 (10) : 3797 - 3806
  • [3] Recurrent networks with attention and convolutional networks for sentence representation and classification
    Tengfei Liu
    Shuangyuan Yu
    Baomin Xu
    Hongfeng Yin
    [J]. Applied Intelligence, 2018, 48 : 3797 - 3806
  • [4] Multimodal Convolutional Neural Networks for Matching Image and Sentence
    Ma, Lin
    Lu, Zhengdong
    Shang, Lifeng
    Li, Hang
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2623 - 2631
  • [5] Fully convolutional recurrent networks for multidate crop recognition from multitemporal image sequences
    Chamorro Martinez, Jorge Andres
    Cue La Rosa, Laura Elena
    Feitosa, Raul Queiroz
    Sanches, Ieda Del'Arco
    Happ, Patrick Nigri
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2021, 171 : 188 - 201
  • [6] Sentence Ordering and Coherence Modeling Using Recurrent Neural Networks
    Logeswaran, Lajanugen
    Lee, Honglak
    Radev, Dragomir
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5285 - 5292
  • [7] Egocentric Image Retrieval with Convolutional Neural Networks
    Oliveira-Barra, Gabriel
    Dimiccoli, Mariella
    Radeva, Petia
    [J]. ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2016, 288 : 71 - 76
  • [8] Sentence Learning on Deep Convolutional Networks for Image Caption Generation
    Kim, Dong-Jin
    Yoo, Donggeun
    Sim, Bonggeun
    Kweon, In So
    [J]. 2016 13TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI), 2016, : 246 - 247
  • [9] Temporal Convolutional and Recurrent Networks for Image Captioning
    Iskra, Natalia
    Iskra, Vitaly
    [J]. PATTERN RECOGNITION AND INFORMATION PROCESSING, PRIP 2019, 2019, 1055 : 254 - 266
  • [10] ERCNN: Enhanced Recurrent Convolutional Neural Networks for Learning Sentence Similarity
    Xie, Niantao
    Li, Sujian
    Zhao, Jinglin
    [J]. CHINESE COMPUTATIONAL LINGUISTICS, CCL 2019, 2019, 11856 : 119 - 130