An Unsupervised Video Summarization Method Based on Multimodal Representation

被引:0
|
作者
Lei, Zhuo [1 ,2 ]
Yu, Qiang [1 ]
Shou, Lidan [2 ]
Li, Shengquan [1 ]
Mao, Yunqing [1 ]
机构
[1] City Cloud Technol China Co Ltd, Hangzhou, Peoples R China
[2] Zhejiang Univ, Hangzhou, Peoples R China
关键词
Video Summarization; Multi-modal Representation Learning; Unsupervised Learning;
D O I
10.1007/978-981-99-4761-4_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A good video summary should convey the whole story and feature the most important content. However, the importance of video content is often subjective, and users should have the option to personalize the summary by using natural language to specify what is important to them. Moreover, existing methods usually apply only visual cues to solve generic video summarization tasks, while this work introduces a single unsupervised multi-modal framework for addressing both generic and query-focused video summarization. We use a multi-head attention model to represent the multi-modal feature. We apply a Transformer-based model to learn the frame scores based on their representative, diversity and reconstruction losses. Especially, we develop a novel representative loss to train the model based on both visual and semantic information. We outperform previous state-of-the-art work with superior results on both generic and query-focused video summarization datasets.
引用
收藏
页码:171 / 180
页数:10
相关论文
共 50 条
  • [21] Unsupervised video summarization with adversarial graph-based attention network
    Gunuganti, Jeshmitha
    Yeh, Zhi-Ting
    Wang, Jenq-Haur
    Norouzi, Mehdi
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 102
  • [22] Discriminative Feature Learning for Unsupervised Video Summarization
    Jung, Yunjae
    Cho, Donghyeon
    Kim, Dahun
    Woo, Sanghyun
    Kweon, In So
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8537 - 8544
  • [23] Unsupervised Video Summarization with Adversarial LSTM Networks
    Mahasseni, Behrooz
    Lam, Michael
    Todorovic, Sinisa
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2982 - 2991
  • [24] EXPLOITING CAPTION DIVERSITY FOR UNSUPERVISED VIDEO SUMMARIZATION
    Kaseris, Michail
    Mademlis, Ioannis
    Pitas, Ioannis
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1650 - 1654
  • [25] A fuzzy video content representation for video summarization and content-based retrieval
    Doulamis, AD
    Doulamis, ND
    Kollias, SD
    SIGNAL PROCESSING, 2000, 80 (06) : 1049 - 1067
  • [26] Multimodal-Based and Aesthetic-Guided Narrative Video Summarization
    Xie, Jiehang
    Chen, Xuanbai
    Zhang, Tianyi
    Zhang, Yixuan
    Lu, Shao-Ping
    Cesar, Pablo
    Yang, Yulu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4894 - 4908
  • [27] Video Summarization Leveraging Multimodal Information for Presentations
    Liu, Hanchao
    Chen, Dapeng
    Li, Rongjun
    Xue, Wenyuan
    Peng, Wei
    INTERSPEECH 2023, 2023, : 5251 - 5252
  • [28] A MULTIMODAL APPROACH FOR AUTOMATIC CRICKET VIDEO SUMMARIZATION
    Bhalla, Aman
    Ahuja, Arpit
    Pant, Pradeep
    Mittal, Ankush
    2019 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2019, : 146 - 150
  • [29] Hierarchical Multimodal Attention for Deep Video Summarization
    Sanabria, Melissa
    Precioso, Frederic
    Menguy, Thomas
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7977 - 7984
  • [30] Using independently recurrent networks for reinforcement learning based unsupervised video summarization
    Yaliniz, Gokhan
    Ikizler-Cinbis, Nazli
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (12) : 17827 - 17847