Self-supervised Spatiotemporal Learning via Video Clip Order Prediction

被引：214

作者：

Xu, Dejing ^{[1
]}

Xiao, Jun ^{[1
]}

Zhao, Zhou ^{[1
]}

Shao, Jian ^{[1
]}

Xie, Di ^{[2
]}

Zhuang, Yueting ^{[1
]}

机构：

[1] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China

[2] Hikvis Res Inst, Hangzhou, Zhejiang, Peoples R China

来源：

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年

基金：

中国国家自然科学基金; 浙江省自然科学基金;

关键词：

D O I：

10.1109/CVPR.2019.01058

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a self-supervised spatiotemporal learning technique which leverages the chronological order of videos. Our method can learn the spatiotemporal representation of the video by predicting the order of shuffled clips from the video. The category of the video is not required, which gives our technique the potential to take advantage of infinite unannotated videos. There exist related works which use frames, while compared to frames, clips are more consistent with the video dynamics. Clips can help to reduce the uncertainty of orders and are more appropriate to learn a video representation. The 3D convolutional neural networks are utilized to extract features for clips, and these features are processed to predict the actual order. The learned representations are evaluated via nearest neighbor retrieval experiments. We also use the learned networks as the pre-trained models and finetune them on the action recognition task. Three types of 3D convolutional neural networks are tested in experiments, and we gain large improvements compared to existing self-supervised methods.

引用

页码：10326 / 10335

页数：10

共 50 条

[1] Explore Video Clip Order With Self-Supervised and Curriculum Learning for Video Applications
Xiao, Jun
Li, Lin
Xu, Dejing
Long, Chengjiang
Shao, Jian
Zhang, Shifeng
Pu, Shiliang
Zhuang, Yueting
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 (23) : 3454 - 3466
[2] Self-Supervised Spatiotemporal Representation Learning by Exploiting Video Continuity
Liang, Hanwen
Quader, Niamul
Chi, Zhixiang
Chen, Lizhe
Dai, Peng
Lu, Juwei
Wang, Yang
[J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1564 - 1573
[3] Discriminative Spatiotemporal Alignment for Self-Supervised Video Correspondence Learning
Wei, Qiaoqiao
Zhang, Hui
Yong, Jun-Hai
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1841 - 1846
[4] Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw
Huo, Yuqi
Ding, Mingyu
Lu, Haoyu
Huang, Ziyuan
Tang, Mingqian
Lu, Zhiwu
Xiang, Tao
[J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 751 - 757
[5] Contrast and Order Representations for Video Self-supervised Learning
Hu, Kai
Shao, Jie
Liu, Yuan
Raj, Bhiksha
Savvides, Marios
Shen, Zhiqiang
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7919 - 7929
[6] Progressive Video Summarization via Multimodal Self-supervised Learning
Li, Haopeng
Ke, Qiuhong
Gong, Mingming
Drummond, Tom
[J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5573 - 5582
[7] Self-Supervised Representation Learning for Videos by Segmenting via Sampling Rate Order Prediction
Huang, Jing
Huang, Yan
Wang, Qicong
Yang, Wenming
Meng, Hongying
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) : 3475 - 3489
[8] Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning
Liu, Kangning
Gu, Shuhang
Romero, Andres
Timofte, Radu
[J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1029 - 1039
[9] Self-Supervised Representation Learning via Latent Graph Prediction
Xie, Yaochen
Xu, Zhao
Ji, Shuiwang
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[10] Self-Supervised Visual Learning by Variable Playback Speeds Prediction of a Video
Cho, Hyeon
Kim, Taehoon
Chang, Hyung Jin
Hwang, Wonjun
[J]. IEEE ACCESS, 2021, 9 : 79562 - 79571

← 1 2 3 4 5 →