Sequence to Sequence - Video to Text

被引：817

作者：

Venugopalan, Subhashini ^{[1
]}

Rohrbach, Marcus ^{[2
,4
]}

Donahue, Jeff ^{[2
]}

Mooney, Raymond ^{[1
]}

Darrell, Trevor ^{[2
]}

Saenko, Kate ^{[3
]}

机构：

[1] Univ Texas Austin, Austin, TX 78712 USA

[2] Univ Calif Berkeley, Berkeley, CA 94720 USA

[3] Univ Massachusetts, Lowell, MA USA

[4] Int Comp Sci Inst, Berkeley, CA 94704 USA

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2015年

关键词：

D O I：

10.1109/ICCV.2015.515

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Real-world videos often have complex dynamics; and methods for generating open-domain video descriptions should be sensitive to temporal structure and allow both input (sequence of frames) and output (sequence of words) of variable length. To approach this problem, we propose a novel end-to-end sequence-to-sequence model to generate captions for videos. For this we exploit recurrent neural networks, specifically LSTMs, which have demonstrated state-of-the-art performance in image caption generation. Our LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip. Our model naturally is able to learn the temporal structure of the sequence of frames as well as the sequence model of the generated sentences, i.e. a language model. We evaluate several variants of our model that exploit different visual features on a standard set of YouTube videos and two movie description datasets (M-VAD and MPII-MD).

引用

页码：4534 / 4542

页数：9

共 50 条

[1] Text Detection using Delaunay Triangulation in Video Sequence
Wu, Liang
Shivakumara, Palaiahnakote
Lu, Tong
Tan, Chew Lim
[J]. 2014 11TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS 2014), 2014, : 41 - 45
[2] Sequence in sequence for video captioning
Wang, Huiyun
Gao, Chongyang
Han, Yahong
[J]. PATTERN RECOGNITION LETTERS, 2020, 130 : 327 - 334
[3] Word Attention for Sequence to Sequence Text Understanding
Wu, Lijun
Tian, Fei
Zhao, Li
Lai, Jianhuang
Liu, Tie-Yan
[J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5578 - 5585
[4] Text Detection in Video Sequence using 1-D DCT
Shin, Do Kyoung
Lee, Jong Min
Kim, Yong Min
Moon, Young Shik
Park, Ki Tae
[J]. 18TH IEEE INTERNATIONAL SYMPOSIUM ON CONSUMER ELECTRONICS (ISCE 2014), 2014,
[5] Abstractive method of text summarization with sequence to sequence RNNs
Masum, Abu Kaisar Mohammad
Abujar, Sheikh
Talukder, Md Ashraful Islam
Rabby, A. K. M. Shahariar Azad
Hossain, Syed Akhter
[J]. 2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
[6] Sequence-to-Sequence Models for Automated Text Simplification
Botarleanu, Robert-Mihai
Dascalu, Mihai
Crossley, Scott Andrew
McNamara, Danielle S.
[J]. ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2020), PT II, 2020, 12164 : 31 - 36
[7] Sequence-to-Sequence Contrastive Learning for Text Recognition
Aberdam, Aviad
Litman, Ron
Tsiper, Shahar
Anschel, Oron
Slossberg, Ron
Mazor, Shai
Manmatha, R.
Perona, Pietro
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15297 - 15307
[8] Video sequence matching
Mohan, R
[J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 3697 - 3700
[9] Bengali abstractive text summarization using sequence to sequence RNNs
Talukder, Md Ashraful Islam
Abujar, Sheikh
Masum, Abu Kaisar Mohammad
Faisal, Fahad
Hossain, Syed Akhter
[J]. 2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
[10] Neural Abstractive Text Summarization with Sequence-to-Sequence Models
Shi, Tian
Keneshloo, Yaser
Ramakrishnan, Naren
Reddy, Chandan K.
[J]. ACM/IMS Transactions on Data Science, 2021, 2 (01):

← 1 2 3 4 5 →