Self-supervised Spatiotemporal Learning via Video Clip Order Prediction

被引:214
|
作者
Xu, Dejing [1 ]
Xiao, Jun [1 ]
Zhao, Zhou [1 ]
Shao, Jian [1 ]
Xie, Di [2 ]
Zhuang, Yueting [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
[2] Hikvis Res Inst, Hangzhou, Zhejiang, Peoples R China
基金
中国国家自然科学基金; 浙江省自然科学基金;
关键词
D O I
10.1109/CVPR.2019.01058
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a self-supervised spatiotemporal learning technique which leverages the chronological order of videos. Our method can learn the spatiotemporal representation of the video by predicting the order of shuffled clips from the video. The category of the video is not required, which gives our technique the potential to take advantage of infinite unannotated videos. There exist related works which use frames, while compared to frames, clips are more consistent with the video dynamics. Clips can help to reduce the uncertainty of orders and are more appropriate to learn a video representation. The 3D convolutional neural networks are utilized to extract features for clips, and these features are processed to predict the actual order. The learned representations are evaluated via nearest neighbor retrieval experiments. We also use the learned networks as the pre-trained models and finetune them on the action recognition task. Three types of 3D convolutional neural networks are tested in experiments, and we gain large improvements compared to existing self-supervised methods.
引用
收藏
页码:10326 / 10335
页数:10
相关论文
共 50 条
  • [1] Explore Video Clip Order With Self-Supervised and Curriculum Learning for Video Applications
    Xiao, Jun
    Li, Lin
    Xu, Dejing
    Long, Chengjiang
    Shao, Jian
    Zhang, Shifeng
    Pu, Shiliang
    Zhuang, Yueting
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 (23) : 3454 - 3466
  • [2] Self-Supervised Spatiotemporal Representation Learning by Exploiting Video Continuity
    Liang, Hanwen
    Quader, Niamul
    Chi, Zhixiang
    Chen, Lizhe
    Dai, Peng
    Lu, Juwei
    Wang, Yang
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1564 - 1573
  • [3] Discriminative Spatiotemporal Alignment for Self-Supervised Video Correspondence Learning
    Wei, Qiaoqiao
    Zhang, Hui
    Yong, Jun-Hai
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1841 - 1846
  • [4] Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw
    Huo, Yuqi
    Ding, Mingyu
    Lu, Haoyu
    Huang, Ziyuan
    Tang, Mingqian
    Lu, Zhiwu
    Xiang, Tao
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 751 - 757
  • [5] Contrast and Order Representations for Video Self-supervised Learning
    Hu, Kai
    Shao, Jie
    Liu, Yuan
    Raj, Bhiksha
    Savvides, Marios
    Shen, Zhiqiang
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7919 - 7929
  • [6] Progressive Video Summarization via Multimodal Self-supervised Learning
    Li, Haopeng
    Ke, Qiuhong
    Gong, Mingming
    Drummond, Tom
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5573 - 5582
  • [7] Self-Supervised Representation Learning for Videos by Segmenting via Sampling Rate Order Prediction
    Huang, Jing
    Huang, Yan
    Wang, Qicong
    Yang, Wenming
    Meng, Hongying
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) : 3475 - 3489
  • [8] Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning
    Liu, Kangning
    Gu, Shuhang
    Romero, Andres
    Timofte, Radu
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1029 - 1039
  • [9] Self-Supervised Representation Learning via Latent Graph Prediction
    Xie, Yaochen
    Xu, Zhao
    Ji, Shuiwang
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [10] Self-Supervised Visual Learning by Variable Playback Speeds Prediction of a Video
    Cho, Hyeon
    Kim, Taehoon
    Chang, Hyung Jin
    Hwang, Wonjun
    [J]. IEEE ACCESS, 2021, 9 : 79562 - 79571