Weakly-Supervised Video Summarization Using Variational Encoder-Decoder and Web Prior

被引：36

作者：

Cai, Sijia ^{[1
,2
]}

Zuo, Wangmeng ^{[3
]}

Davis, Larry S. ^{[4
]}

Zhang, Lei ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Comp, Kowloon, Hong Kong, Peoples R China

[2] DAMO Acad, Alibaba Grp, Hangzhou, Peoples R China

[3] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China

[4] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA

来源：

COMPUTER VISION - ECCV 2018, PT XIV | 2018年 / 11218卷

关键词：

Video summarization; Variational autoencoder;

D O I：

10.1007/978-3-030-01264-9_12

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video summarization is a challenging under-constrained problem because the underlying summary of a single video strongly depends on users' subjective understandings. Data-driven approaches, such as deep neural networks, can deal with the ambiguity inherent in this task to some extent, but it is extremely expensive to acquire the temporal annotations of a large-scale video dataset. To leverage the plentiful web-crawled videos to improve the performance of video summarization, we present a generative modelling framework to learn the latent semantic video representations to bridge the benchmark data and web data. Specifically, our framework couples two important components: a variational autoencoder for learning the latent semantics from web videos, and an encoder-attention-decoder for saliency estimation of raw video and summary generation. A loss term to learn the semantic matching between the generated summaries and web videos is presented, and the overall framework is further formulated into a unified conditional variational encoder-decoder, called variational encoder-summarizer-decoder (VESD). Experiments conducted on the challenging datasets CoSum and TVSum demonstrate the superior performance of the proposed VESD to existing state-of-the-art methods. The source code of this work can be found at https://github.com/cssjcai/vesd.

引用

页码：193 / 210

页数：18

共 50 条

[1] Video Summarization With Attention-Based Encoder-Decoder Networks
Ji, Zhong
Xiong, Kailin
Pang, Yanwei
Li, Xuelong
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (06) : 1709 - 1717
[2] AUTOMATIC SINGING TRANSCRIPTION BASED ON ENCODER-DECODER RECURRENT NEURAL NETWORKS WITH A WEAKLY-SUPERVISED ATTENTION MECHANISM
Nishikimi, Ryo
Nakamura, Eita
Fukayama, Satoru
Goto, Masataka
Yoshii, Kazuyoshi
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 161 - 165
[3] Variational Memory Encoder-Decoder
Hung Le
Truyen Tran
Thin Nguyen
Venkatesh, Svetha
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[4] Effective Video Summarization Using Channel Attention-Assisted Encoder-Decoder Framework
Alharbi, Faisal
Habib, Shabana
Albattah, Waleed
Jan, Zahoor
Alanazi, Meshari D.
Islam, Muhammad
[J]. SYMMETRY-BASEL, 2024, 16 (06):
[5] An encoder-decoder framework with dynamic convolution for weakly supervised instance segmentation
Zhu, Liangjun
Peng, Li
Ding, Shuchen
Liu, Zhongren
[J]. IET COMPUTER VISION, 2023, 17 (08) : 883 - 894
[6] Encoder-Decoder Architectures based Video Summarization using Key-Shot Selection Model
Yashwanth, Kolli
Soni, Badal
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (11) : 31395 - 31415
[7] Encoder-Decoder Architectures based Video Summarization using Key-Shot Selection Model
Kolli Yashwanth
Badal Soni
[J]. Multimedia Tools and Applications, 2024, 83 : 31395 - 31415
[8] A Normalized Encoder-Decoder Model for Abstractive Summarization Using Focal Loss
Shi, Yunsheng
Meng, Jun
Wang, Jian
Lin, Hongfei
Li, Yumeng
[J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2018, PT II, 2018, 11109 : 383 - 392
[9] A Dual Attention Encoder-Decoder Text Summarization Model
Hakami, Nada Ali
Mahmoud, Hanan Ahmed Hosni
[J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 3697 - 3710
[10] A Multimodal Variational Encoder-Decoder Framework for Micro-video Popularity Prediction
Xie, Jiayi
Zhu, Yaochen
Zhang, Zhibin
Peng, Jian
Yi, Jing
Hu, Yaosi
Liu, Hongyi
Chen, Zhenzhong
[J]. WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 2542 - 2548

← 1 2 3 4 5 →