Self-Supervised Learning for Videos: A Survey

被引:30
|
作者
Schiappa, Madeline C. [1 ]
Rawat, Yogesh S. [1 ]
Shah, Mubarak [1 ]
机构
[1] Univ Cent Florida, Ctr Res Comp Vis, 4328 Scorpius St Suite 245, Orlando, FL 43282 USA
关键词
Self-supervised learning; deep learning; video understanding; zero-shot learning; representation learning; multimodal learning; visual-language models; SPEAKER;
D O I
10.1145/3577925
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The remarkable success of deep learning in various domains relies on the availability of large-scale annotated datasets. However, obtaining annotations is expensive and requires great effort, which is especially challenging for videos. Moreover, the use of human-generated annotations leads to models with biased learning and poor domain generalization and robustness. As an alternative, self-supervised learning provides a way for representation learning that does not require annotations and has shown promise in both image and video domains. In contrast to the image domain, learning video representations are more challenging due to the temporal dimension, bringing in motion and other environmental dynamics. This also provides opportunities for video-exclusive ideas that advance self-supervised learning in the video and multimodal domains. In this survey, we provide a review of existing approaches on self-supervised learning focusing on the video domain. We summarize these methods into four different categories based on their learning objectives: (1) pretext tasks, (2) generative learning, (3) contrastive learning, and (4) cross-modal agreement. We further introduce the commonly used datasets, downstream evaluation tasks, insights into the limitations of existing works, and the potential future directions in this area.
引用
收藏
页数:37
相关论文
共 50 条
  • [1] Audio self-supervised learning: A survey
    Liu, Shuo
    Mallol-Ragolta, Adria
    Parada-Cabaleiro, Emilia
    Qian, Kun
    Jing, Xin
    Kathan, Alexander
    Hu, Bin
    Schuller, Bjorn W.
    [J]. PATTERNS, 2022, 3 (12):
  • [2] A Survey on Contrastive Self-Supervised Learning
    Jaiswal, Ashish
    Babu, Ashwin Ramesh
    Zadeh, Mohammad Zaki
    Banerjee, Debapriya
    Makedon, Fillia
    [J]. TECHNOLOGIES, 2021, 9 (01)
  • [3] Graph Self-Supervised Learning: A Survey
    Liu, Yixin
    Jin, Ming
    Pan, Shirui
    Zhou, Chuan
    Zheng, Yu
    Xia, Feng
    Yu, Philip S.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (06) : 5879 - 5900
  • [4] Self-supervised Object-Centric Learning for Videos
    Aydemir, Gorkay
    Xie, Weidi
    Guney, Fatma
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Self-Supervised Learning to Detect Key Frames in Videos
    Yan, Xiang
    Gilani, Syed Zulqarnain
    Feng, Mingtao
    Zhang, Liang
    Qin, Hanlin
    Mian, Ajmal
    [J]. SENSORS, 2020, 20 (23) : 1 - 18
  • [6] Exploring Relations in Untrimmed Videos for Self-Supervised Learning
    Luo, Dezhao
    Zhou, Yu
    Fang, Bo
    Zhou, Yucan
    Wu, Dayan
    Wang, Weiping
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (01)
  • [7] Self-Supervised Learning for Recommender Systems: A Survey
    Yu, Junliang
    Yin, Hongzhi
    Xia, Xin
    Chen, Tong
    Li, Jundong
    Huang, Zi
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (01) : 335 - 355
  • [8] Segmenting Cardiac Ultrasound Videos Using Self-Supervised Learning
    Lamoureux, Erik
    Ayromlou, Sana
    Amiri, Seyedeh Neda Ahmadi
    Rhodin, Helge
    [J]. 2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [9] SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos
    Khorasgani, Salar Hosseini
    Chen, Yuxuan
    Shkurti, Florian
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16070 - 16080
  • [10] Self-Supervised Learning from Untrimmed Videos via Hierarchical Consistency
    Qing, Zhiwu
    Zhang, Shiwei
    Huang, Ziyuan
    Xu, Yi
    Wang, Xiang
    Gao, Changxin
    Jin, Rong
    Sang, Nong
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 12408 - 12426