Self-supervised Spatiotemporal Learning via Video Clip Order Prediction

被引:214
|
作者
Xu, Dejing [1 ]
Xiao, Jun [1 ]
Zhao, Zhou [1 ]
Shao, Jian [1 ]
Xie, Di [2 ]
Zhuang, Yueting [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
[2] Hikvis Res Inst, Hangzhou, Zhejiang, Peoples R China
基金
浙江省自然科学基金; 中国国家自然科学基金;
关键词
D O I
10.1109/CVPR.2019.01058
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a self-supervised spatiotemporal learning technique which leverages the chronological order of videos. Our method can learn the spatiotemporal representation of the video by predicting the order of shuffled clips from the video. The category of the video is not required, which gives our technique the potential to take advantage of infinite unannotated videos. There exist related works which use frames, while compared to frames, clips are more consistent with the video dynamics. Clips can help to reduce the uncertainty of orders and are more appropriate to learn a video representation. The 3D convolutional neural networks are utilized to extract features for clips, and these features are processed to predict the actual order. The learned representations are evaluated via nearest neighbor retrieval experiments. We also use the learned networks as the pre-trained models and finetune them on the action recognition task. Three types of 3D convolutional neural networks are tested in experiments, and we gain large improvements compared to existing self-supervised methods.
引用
收藏
页码:10326 / 10335
页数:10
相关论文
共 50 条
  • [21] Self-supervised Learning for Unintentional Action Prediction
    Zatsarynna, Olga
    Abu Farha, Yazan
    Gall, Juergen
    [J]. PATTERN RECOGNITION, DAGM GCPR 2022, 2022, 13485 : 429 - 444
  • [22] Spatiotemporal self-supervised predictive learning for atmospheric variable prediction via multi-group multi-attention
    Shi, Zhensheng
    Zheng, Haiyong
    Dong, Junyu
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 300
  • [23] Self-supervised Video Hashing via Bidirectional Transformers
    Li, Shuyan
    Li, Xiu
    Lu, Jiwen
    Zhou, Jie
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13544 - 13553
  • [24] Self-Supervised Learning for Action Recognition by Video Denoising
    Thi Thu Trang Phung
    Thi Hong Thu Ma
    Van Truong Nguyen
    Duc Quang Vu
    [J]. 2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 76 - 81
  • [25] Self-Supervised Video Defocus Deblurring with Atlas Learning
    Ruan, Lingyan
    Balint, Martin
    Bemana, Mojtaba
    Wolski, Krzysztof
    Seidel, Hans-Peter
    Myszkowski, Karol
    Chen, Bin
    [J]. PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
  • [26] Video Face Clustering with Self-Supervised Representation Learning
    Sharma V.
    Tapaswi M.
    Saquib Sarfraz M.
    Stiefelhagen R.
    [J]. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2020, 2 (02): : 145 - 157
  • [27] Broaden Your Views for Self-Supervised Video Learning
    Recasens, Adria
    Luc, Pauline
    Alayrac, Jean-Baptiste
    Wang, Luyu
    Strub, Florian
    Tallec, Corentin
    Malinowski, Mateusz
    Patraaucean, Viorica
    Altche, Florent
    Valko, Michal
    Grill, Jean-Bastien
    van den Oord, Aaron
    Zisserman, Andrew
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1235 - 1245
  • [28] Self-Supervised Representation Learning for Video Quality Assessment
    Jiang, Shaojie
    Sang, Qingbing
    Hu, Zongyao
    Liu, Lixiong
    [J]. IEEE TRANSACTIONS ON BROADCASTING, 2023, 69 (01) : 118 - 129
  • [29] Video Motion Perception for Self-supervised Representation Learning
    Li, Wei
    Luo, Dezhao
    Fang, Bo
    Li, Xiaoni
    Zhou, Yu
    Wang, Weiping
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 508 - 520
  • [30] Self-supervised learning of class embeddings from video
    Wiles, Olivia
    Koepke, A. Sophia
    Zisserman, Andrew
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3019 - 3027