An Unsupervised Video Summarization Method Based on Multimodal Representation

被引:0
|
作者
Lei, Zhuo [1 ,2 ]
Yu, Qiang [1 ]
Shou, Lidan [2 ]
Li, Shengquan [1 ]
Mao, Yunqing [1 ]
机构
[1] City Cloud Technol China Co Ltd, Hangzhou, Peoples R China
[2] Zhejiang Univ, Hangzhou, Peoples R China
关键词
Video Summarization; Multi-modal Representation Learning; Unsupervised Learning;
D O I
10.1007/978-981-99-4761-4_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A good video summary should convey the whole story and feature the most important content. However, the importance of video content is often subjective, and users should have the option to personalize the summary by using natural language to specify what is important to them. Moreover, existing methods usually apply only visual cues to solve generic video summarization tasks, while this work introduces a single unsupervised multi-modal framework for addressing both generic and query-focused video summarization. We use a multi-head attention model to represent the multi-modal feature. We apply a Transformer-based model to learn the frame scores based on their representative, diversity and reconstruction losses. Especially, we develop a novel representative loss to train the model based on both visual and semantic information. We outperform previous state-of-the-art work with superior results on both generic and query-focused video summarization datasets.
引用
收藏
页码:171 / 180
页数:10
相关论文
共 50 条
  • [41] Unsupervised video summarization via clustering validity index
    Ye Zhao
    Yanrong Guo
    Rui Sun
    Zhengqiong Liu
    Dan Guo
    Multimedia Tools and Applications, 2020, 79 : 33417 - 33430
  • [42] ADVERSARIAL UNSUPERVISED VIDEO SUMMARIZATION AUGMENTED WITH DICTIONARY LOSS
    Kaseris, Michail
    Mademlis, Ioannis
    Pitas, Ioannis
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2683 - 2687
  • [43] Unsupervised Reinforcement Learning For Video Summarization Reward Function
    Wang, Lei
    Zhu, Yaping
    Pan, Hong
    PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO AND SIGNAL PROCESSING (IVSP 2019), 2019, : 40 - 44
  • [44] EXPLORING THE INFLUENCE OF FEATURE REPRESENTATION FOR DICTIONARY SELECTION BASED VIDEO SUMMARIZATION
    Ma, Mingyang
    Mei, Shaohui
    Ji, Jingyu
    Wan, Shuai
    Wang, Zhiyong
    Feng, Dagan
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 2911 - 2915
  • [45] A Static Video Summarization Method Based on Hierarchical Clustering
    Guimaraes, Silvio Jamil F.
    Gomes, Willer
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, 2010, 6419 : 46 - 54
  • [46] Video Summarization Method Based on the Weber Local Descriptor
    Cirne, Marcos Vinicius Mussel
    Pedrini, Helio
    2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 1304 - 1309
  • [47] A Novel Hierarchical Dynamic Video Summarization Representation for Video Analysis
    Li, Xiangwei
    Kang, Yuxiu
    Zheng, Gang
    MECHATRONICS AND INTELLIGENT MATERIALS II, PTS 1-6, 2012, 490-495 : 465 - +
  • [48] Multimodal Local Feature Enhancement Network for Video Summarization
    Li, Zhaoyun
    Ren, Xiwei
    Du, Fengyi
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI, 2024, 14430 : 158 - 169
  • [49] Multimodal pretraining for unsupervised protein representation learning
    Nguyen, Viet Thanh Duy
    Hy, Truong Son
    BIOLOGY METHODS & PROTOCOLS, 2024, 9 (01):
  • [50] Code Search Method based on Multimodal Representation
    Chen, Xiao
    Wu, Junhua
    2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY, AND SECURITY COMPANION, QRS-C, 2022, : 485 - 491