Progressive Video Summarization via Multimodal Self-supervised Learning

被引：16

作者：

Li, Haopeng ^{[1
]}

Ke, Qiuhong ^{[3
]}

Gong, Mingming ^{[2
]}

Drummond, Tom ^{[1
]}

机构：

[1] Univ Melbourne, Sch Comp & Informat Syst, Melbourne, Vic, Australia

[2] Univ Melbourne, Sch Math & Stat, Melbourne, Vic, Australia

[3] Monash Univ, Dept Data Sci & AI, Clayton, Vic, Australia

来源：

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2023年

关键词：

D O I：

10.1109/WACV56688.2023.00554

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Modern video summarization methods are based on deep neural networks that require a large amount of annotated data for training. However, existing datasets for video summarization are small-scale, easily leading to over-fitting of the deep models. Considering that the annotation of largescale datasets is time-consuming, we propose a multimodal self-supervised learning framework to obtain semantic representations of videos, which benefits the video summarization task. Specifically, the self-supervised learning is conducted by exploring the semantic consistency between the videos and text in both coarse-grained and fine-grained fashions, as well as recovering masked frames in the videos. The multimodal framework is trained on a newly-collected dataset that consists of video-text pairs. Additionally, we introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries. Extensive experiments have proved the effectiveness and superiority of our method in rank correlation coefficients and F-score1.

引用

页码：5573 / 5582

页数：10

共 50 条

[31] Multimodal self-supervised learning for semantic analysis of PolSAR imagery
Dong, Yanxin
Haensch, Ronny
[J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 1704 - 1707
[32] Self-Supervised Scene-Debiasing for Video Representation Learning via Background Patching
Assefa, Maregu
Jiang, Wei
Gedamu, Kumie
Yilma, Getinet
Kumeda, Bulbula
Ayalew, Melese
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5500 - 5515
[33] Facial Video-Based Remote Physiological Measurement via Self-Supervised Learning
Yue, Zijie
Shi, Miaojing
Ding, Shuai
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13844 - 13859
[34] Consistent 3D Hand Reconstruction in Video via Self-Supervised Learning
Tu, Zhigang
Huang, Zhisheng
Chen, Yujin
Kang, Di
Bao, Linchao
Yang, Bisheng
Yuan, Junsong
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 9469 - 9485
[35] Self-supervised Video Representation Learning via Capturing Semantic Changes Indicated by Saccades
Lai Q.
Zeng A.
Wang Y.
Cao L.
Li Y.
Xu Q.
[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (08) : 1 - 1
[36] Self-supervised Video Transformer
Ranasinghe, Kanchana
Naseer, Muzammal
Khan, Salman
Khan, Fahad Shahbaz
Ryoo, Michael S.
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2864 - 2874
[37] Progressive Self-Supervised Learning for CASSI Computational Spectral Cameras
Mei, Xiaoyin
Li, Yuqi
Fu, Qiang
Heidrich, Wolfgang
[J]. IEEE Transactions on Computational Imaging, 2024, 10 : 1505 - 1518
[38] Self-supervised Pre-training and Semi-supervised Learning for Extractive Dialog Summarization
Zhuang, Yingying
Song, Jiecheng
Sadagopan, Narayanan
Beniwal, Anurag
[J]. COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 1069 - 1076
[39] Self-Supervised MultiModal Versatile Networks
Alayrac, Jean-Baptiste
Recasens, Adria
Schneider, Rosalia
Arandjelovic, Relja
Ramapuram, Jason
De Fauw, Jeffrey
Smaira, Lucas
Dieleman, Sander
Zisserman, Andrew
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[40] Self-Supervised Marine Video Analysis via Siamese Network
Liang, Ju
Song, Jihan
Li, Qianqian
Shi, Zhensheng
Gu, Zhaorui
Zheng, Haiyong
Zheng, Bing
[J]. OCEANS 2021: SAN DIEGO - PORTO, 2021,

← 1 2 3 4 5 →