Weakly-Supervised Video Summarization Using Variational Encoder-Decoder and Web Prior

被引:36
|
作者
Cai, Sijia [1 ,2 ]
Zuo, Wangmeng [3 ]
Davis, Larry S. [4 ]
Zhang, Lei [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Comp, Kowloon, Hong Kong, Peoples R China
[2] DAMO Acad, Alibaba Grp, Hangzhou, Peoples R China
[3] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
[4] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
来源
关键词
Video summarization; Variational autoencoder;
D O I
10.1007/978-3-030-01264-9_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video summarization is a challenging under-constrained problem because the underlying summary of a single video strongly depends on users' subjective understandings. Data-driven approaches, such as deep neural networks, can deal with the ambiguity inherent in this task to some extent, but it is extremely expensive to acquire the temporal annotations of a large-scale video dataset. To leverage the plentiful web-crawled videos to improve the performance of video summarization, we present a generative modelling framework to learn the latent semantic video representations to bridge the benchmark data and web data. Specifically, our framework couples two important components: a variational autoencoder for learning the latent semantics from web videos, and an encoder-attention-decoder for saliency estimation of raw video and summary generation. A loss term to learn the semantic matching between the generated summaries and web videos is presented, and the overall framework is further formulated into a unified conditional variational encoder-decoder, called variational encoder-summarizer-decoder (VESD). Experiments conducted on the challenging datasets CoSum and TVSum demonstrate the superior performance of the proposed VESD to existing state-of-the-art methods. The source code of this work can be found at https://github.com/cssjcai/vesd.
引用
收藏
页码:193 / 210
页数:18
相关论文
共 50 条
  • [21] Weakly Supervised Summarization of Web Videos
    Panda, Rameswar
    Das, Abir
    Wu, Ziyan
    Ernst, Jan
    Roy-Chowdhury, Amit K.
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3677 - 3686
  • [22] An Encoder-Decoder Architecture for the Prediction of Web Service QoS
    Smahi, Mohammed Ismail
    Hadjila, Fethellah
    Tibermacine, Chouki
    Merzoug, Mohammed
    Benamar, Abdelkrim
    SERVICE-ORIENTED AND CLOUD COMPUTING (ESOCC 2018), 2018, 11116 : 74 - 89
  • [23] Multi-Supervised Encoder-Decoder for Image Forgery Localization
    Yu, Chunfang
    Zhou, Jizhe
    Li, Qin
    ELECTRONICS, 2021, 10 (18)
  • [24] Exploring Encoder-Decoder Model for Distant Supervised Relation Extraction
    Su, Sen
    Jia, Ningning
    Cheng, Xiang
    Zhu, Shuguang
    Li, Ruiping
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4389 - 4395
  • [25] Weakly-Supervised RGBD Video Object Segmentation
    Yang, Jinyu
    Gao, Mingqi
    Zheng, Feng
    Zhen, Xiantong
    Ji, Rongrong
    Shao, Ling
    Leonardis, Ales
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2158 - 2170
  • [26] Retrieval Augmented Convolutional Encoder-decoder Networks for Video Captioning
    Chen, Jingwen
    Pan, Yingwei
    Li, Yehao
    Yao, Ting
    Chao, Hongyang
    Mei, Tao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
  • [27] Semantic Enhanced Encoder-Decoder Network (SEN) for Video Captioning
    Gui, Yuling
    Guo, Dan
    Zhao, Ye
    PROCEEDINGS OF THE 2ND WORKSHOP ON MULTIMEDIA FOR ACCESSIBLE HUMAN COMPUTER INTERFACES (MAHCI '19), 2019, : 25 - 32
  • [28] Empirical autopsy of deep video captioning encoder-decoder architecture
    Aafaq, Nayyer
    Akhtar, Naveed
    Liu, Wei
    Mian, Ajmal
    ARRAY, 2021, 9
  • [29] Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
    Chen, Jingwen
    Pan, Yingwei
    Li, Yehao
    Yao, Ting
    Chao, Hongyang
    Mei, Tao
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8167 - 8174
  • [30] On Mining Conditions using Encoder-decoder Networks
    Gallego, Fernando O.
    Corchuelo, Rafael
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 624 - 630