Sequence to Sequence - Video to Text

被引:817
|
作者
Venugopalan, Subhashini [1 ]
Rohrbach, Marcus [2 ,4 ]
Donahue, Jeff [2 ]
Mooney, Raymond [1 ]
Darrell, Trevor [2 ]
Saenko, Kate [3 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
[3] Univ Massachusetts, Lowell, MA USA
[4] Int Comp Sci Inst, Berkeley, CA 94704 USA
关键词
D O I
10.1109/ICCV.2015.515
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-world videos often have complex dynamics; and methods for generating open-domain video descriptions should be sensitive to temporal structure and allow both input (sequence of frames) and output (sequence of words) of variable length. To approach this problem, we propose a novel end-to-end sequence-to-sequence model to generate captions for videos. For this we exploit recurrent neural networks, specifically LSTMs, which have demonstrated state-of-the-art performance in image caption generation. Our LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip. Our model naturally is able to learn the temporal structure of the sequence of frames as well as the sequence model of the generated sentences, i.e. a language model. We evaluate several variants of our model that exploit different visual features on a standard set of YouTube videos and two movie description datasets (M-VAD and MPII-MD).
引用
收藏
页码:4534 / 4542
页数:9
相关论文
共 50 条
  • [1] Text Detection using Delaunay Triangulation in Video Sequence
    Wu, Liang
    Shivakumara, Palaiahnakote
    Lu, Tong
    Tan, Chew Lim
    [J]. 2014 11TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS 2014), 2014, : 41 - 45
  • [2] Sequence in sequence for video captioning
    Wang, Huiyun
    Gao, Chongyang
    Han, Yahong
    [J]. PATTERN RECOGNITION LETTERS, 2020, 130 : 327 - 334
  • [3] Word Attention for Sequence to Sequence Text Understanding
    Wu, Lijun
    Tian, Fei
    Zhao, Li
    Lai, Jianhuang
    Liu, Tie-Yan
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5578 - 5585
  • [4] Text Detection in Video Sequence using 1-D DCT
    Shin, Do Kyoung
    Lee, Jong Min
    Kim, Yong Min
    Moon, Young Shik
    Park, Ki Tae
    [J]. 18TH IEEE INTERNATIONAL SYMPOSIUM ON CONSUMER ELECTRONICS (ISCE 2014), 2014,
  • [5] Abstractive method of text summarization with sequence to sequence RNNs
    Masum, Abu Kaisar Mohammad
    Abujar, Sheikh
    Talukder, Md Ashraful Islam
    Rabby, A. K. M. Shahariar Azad
    Hossain, Syed Akhter
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [6] Sequence-to-Sequence Models for Automated Text Simplification
    Botarleanu, Robert-Mihai
    Dascalu, Mihai
    Crossley, Scott Andrew
    McNamara, Danielle S.
    [J]. ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2020), PT II, 2020, 12164 : 31 - 36
  • [7] Sequence-to-Sequence Contrastive Learning for Text Recognition
    Aberdam, Aviad
    Litman, Ron
    Tsiper, Shahar
    Anschel, Oron
    Slossberg, Ron
    Mazor, Shai
    Manmatha, R.
    Perona, Pietro
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15297 - 15307
  • [8] Video sequence matching
    Mohan, R
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 3697 - 3700
  • [9] Bengali abstractive text summarization using sequence to sequence RNNs
    Talukder, Md Ashraful Islam
    Abujar, Sheikh
    Masum, Abu Kaisar Mohammad
    Faisal, Fahad
    Hossain, Syed Akhter
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [10] Neural Abstractive Text Summarization with Sequence-to-Sequence Models
    Shi, Tian
    Keneshloo, Yaser
    Ramakrishnan, Naren
    Reddy, Chandan K.
    [J]. ACM/IMS Transactions on Data Science, 2021, 2 (01):