Progressive Video Summarization via Multimodal Self-supervised Learning

被引:16
|
作者
Li, Haopeng [1 ]
Ke, Qiuhong [3 ]
Gong, Mingming [2 ]
Drummond, Tom [1 ]
机构
[1] Univ Melbourne, Sch Comp & Informat Syst, Melbourne, Vic, Australia
[2] Univ Melbourne, Sch Math & Stat, Melbourne, Vic, Australia
[3] Monash Univ, Dept Data Sci & AI, Clayton, Vic, Australia
关键词
D O I
10.1109/WACV56688.2023.00554
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern video summarization methods are based on deep neural networks that require a large amount of annotated data for training. However, existing datasets for video summarization are small-scale, easily leading to over-fitting of the deep models. Considering that the annotation of largescale datasets is time-consuming, we propose a multimodal self-supervised learning framework to obtain semantic representations of videos, which benefits the video summarization task. Specifically, the self-supervised learning is conducted by exploring the semantic consistency between the videos and text in both coarse-grained and fine-grained fashions, as well as recovering masked frames in the videos. The multimodal framework is trained on a newly-collected dataset that consists of video-text pairs. Additionally, we introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries. Extensive experiments have proved the effectiveness and superiority of our method in rank correlation coefficients and F-score1.
引用
收藏
页码:5573 / 5582
页数:10
相关论文
共 50 条
  • [31] Multimodal self-supervised learning for semantic analysis of PolSAR imagery
    Dong, Yanxin
    Haensch, Ronny
    [J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 1704 - 1707
  • [32] Self-Supervised Scene-Debiasing for Video Representation Learning via Background Patching
    Assefa, Maregu
    Jiang, Wei
    Gedamu, Kumie
    Yilma, Getinet
    Kumeda, Bulbula
    Ayalew, Melese
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5500 - 5515
  • [33] Facial Video-Based Remote Physiological Measurement via Self-Supervised Learning
    Yue, Zijie
    Shi, Miaojing
    Ding, Shuai
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13844 - 13859
  • [34] Consistent 3D Hand Reconstruction in Video via Self-Supervised Learning
    Tu, Zhigang
    Huang, Zhisheng
    Chen, Yujin
    Kang, Di
    Bao, Linchao
    Yang, Bisheng
    Yuan, Junsong
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 9469 - 9485
  • [35] Self-supervised Video Representation Learning via Capturing Semantic Changes Indicated by Saccades
    Lai Q.
    Zeng A.
    Wang Y.
    Cao L.
    Li Y.
    Xu Q.
    [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (08) : 1 - 1
  • [36] Self-supervised Video Transformer
    Ranasinghe, Kanchana
    Naseer, Muzammal
    Khan, Salman
    Khan, Fahad Shahbaz
    Ryoo, Michael S.
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2864 - 2874
  • [37] Progressive Self-Supervised Learning for CASSI Computational Spectral Cameras
    Mei, Xiaoyin
    Li, Yuqi
    Fu, Qiang
    Heidrich, Wolfgang
    [J]. IEEE Transactions on Computational Imaging, 2024, 10 : 1505 - 1518
  • [38] Self-supervised Pre-training and Semi-supervised Learning for Extractive Dialog Summarization
    Zhuang, Yingying
    Song, Jiecheng
    Sadagopan, Narayanan
    Beniwal, Anurag
    [J]. COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 1069 - 1076
  • [39] Self-Supervised MultiModal Versatile Networks
    Alayrac, Jean-Baptiste
    Recasens, Adria
    Schneider, Rosalia
    Arandjelovic, Relja
    Ramapuram, Jason
    De Fauw, Jeffrey
    Smaira, Lucas
    Dieleman, Sander
    Zisserman, Andrew
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [40] Self-Supervised Marine Video Analysis via Siamese Network
    Liang, Ju
    Song, Jihan
    Li, Qianqian
    Shi, Zhensheng
    Gu, Zhaorui
    Zheng, Haiyong
    Zheng, Bing
    [J]. OCEANS 2021: SAN DIEGO - PORTO, 2021,