An Unsupervised Video Summarization Method Based on Multimodal Representation

被引:0
|
作者
Lei, Zhuo [1 ,2 ]
Yu, Qiang [1 ]
Shou, Lidan [2 ]
Li, Shengquan [1 ]
Mao, Yunqing [1 ]
机构
[1] City Cloud Technol China Co Ltd, Hangzhou, Peoples R China
[2] Zhejiang Univ, Hangzhou, Peoples R China
关键词
Video Summarization; Multi-modal Representation Learning; Unsupervised Learning;
D O I
10.1007/978-981-99-4761-4_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A good video summary should convey the whole story and feature the most important content. However, the importance of video content is often subjective, and users should have the option to personalize the summary by using natural language to specify what is important to them. Moreover, existing methods usually apply only visual cues to solve generic video summarization tasks, while this work introduces a single unsupervised multi-modal framework for addressing both generic and query-focused video summarization. We use a multi-head attention model to represent the multi-modal feature. We apply a Transformer-based model to learn the frame scores based on their representative, diversity and reconstruction losses. Especially, we develop a novel representative loss to train the model based on both visual and semantic information. We outperform previous state-of-the-art work with superior results on both generic and query-focused video summarization datasets.
引用
收藏
页码:171 / 180
页数:10
相关论文
共 50 条
  • [1] A GAN based Video Summarization Method with Representation Loss
    Lei, Zhuo
    Yu, Qiang
    Shou, Lidan
    Li, Shengquan
    Mao, Yunqing
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1155 - 1159
  • [2] Video Summarization Based on Multimodal Features
    Zhang, Yu
    Liu, Ju
    Liu, Xiaoxi
    Gao, Xuesong
    INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2020, 11 (04): : 60 - 76
  • [3] Video summarization based on semantic representation
    Carlos, RP
    Uehara, K
    ADVANCED MULTIMEDIA CONTENT PROCESSING, 1999, 1554 : 1 - 16
  • [4] Mutual Information based Method for Unsupervised Disentanglement of Video Representation
    Sreekar, P. Aditya
    Tiwari, Ujjwal
    Namboodiri, Anoop
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 6396 - 6403
  • [5] Unsupervised Video Summarization based on Consistent Clip Generation
    Ai, Xin
    Song, Yan
    Li, Zechao
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [6] Efficinet video summarization based on a fuzzy video content representation
    Doulamis, AD
    Doulamis, ND
    Kollias, SD
    ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL IV: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 301 - 304
  • [7] Interactive System for Video Summarization Based on Multimodal Fusion
    Zheng Li
    Xiaobing Du
    Cuixia Ma
    Yanfeng Li
    Hongan Wang
    JournalofBeijingInstituteofTechnology, 2019, 28 (01) : 27 - 34
  • [8] Interactive System for Video Summarization Based on Multimodal Fusion
    Li Z.
    Du X.
    Ma C.
    Li Y.
    Wang H.
    Journal of Beijing Institute of Technology (English Edition), 2019, 28 (01): : 27 - 34
  • [9] Multimodal Video Summarization based on Fuzzy Similarity Features
    Psallidas, Theodoros
    Vasilakakis, Michael D.
    Spyrou, Evaggelos
    Iakovidis, Dimitris K.
    2022 IEEE 14TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2022,
  • [10] RL Based Unsupervised Video Summarization Framework for Ultrasound Imaging
    Mathews, Roshan P.
    Panicker, Mahesh Raveendranatha
    Hareendranathan, Abhilash R.
    Chen, Yale Tung
    Jaremko, Jacob L.
    Buchanan, Brian
    Narayan, Kiran Vishnu
    Chandrasekharan, Kesavadas
    Mathews, Greeta
    SIMPLIFYING MEDICAL ULTRASOUND, ASMUS 2022, 2022, 13565 : 23 - 33