Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw

被引:0
|
作者
Huo, Yuqi [1 ,2 ]
Ding, Mingyu [3 ]
Lu, Haoyu [1 ]
Huang, Ziyuan [4 ]
Tang, Mingqian [5 ]
Lu, Zhiwu [2 ]
Xiang, Tao [6 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
[2] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
[3] Univ Hong Kong, Hong Kong, Peoples R China
[4] Natl Univ Singapore, Singapore, Singapore
[5] Alibaba Grp, Hangzhou, Peoples R China
[6] Univ Surrey, Surrey, England
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a novel pretext task for self-supervised video representation learning by exploiting spatiotemporal continuity in videos. It is motivated by the fact that videos are spatiotemporal by nature and a representation learned by detecting spatiotemporal continuity/discontinuity is thus beneficial for downstream video content analysis tasks. A natural choice of such a pretext task is to construct spatiotemporal (3D) jigsaw puzzles and learn to solve them. However, as we demonstrate in the experiments, this task turns out to be intractable. We thus propose Constrained Spatiotemporal Jigsaw (CSJ) whereby the 3D jigsaws are formed in a constrained manner to ensure that large continuous spatiotemporal cuboids exist. This provides sufficient cues for the model to reason about the continuity. Instead of solving them directly, which could still be extremely hard, we carefully design four surrogate tasks that are more solvable. The four tasks aim to learn representations sensitive to spatiotemporal continuity at both the local and global levels. Extensive experiments show that our CSJ achieves state-of-the-art on various benchmarks.
引用
下载
收藏
页码:751 / 757
页数:7
相关论文
共 50 条
  • [21] Self-supervised Co-training for Video Representation Learning
    Han, Tengda
    Xie, Weidi
    Zisserman, Andrew
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [22] ViewCLR: Learning Self-supervised Video Representation for Unseen Viewpoints
    Das, Srijan
    Ryoo, Michael S.
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5562 - 5572
  • [23] Temporally coherent embeddings for self-supervised video representation learning
    CSIRO-Data61, Brisbane
    QLD
    4069, Australia
    不详
    QLD
    4000, Australia
    不详
    QLD
    4072, Australia
    arXiv,
  • [24] SELFGAIT: A SPATIOTEMPORAL REPRESENTATION LEARNING METHOD FOR SELF-SUPERVISED GAIT RECOGNITION
    Liu, Yiqun
    Zeng, Yi
    Pu, Jian
    Shan, Hongming
    He, Peiyang
    Zhang, Junping
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2570 - 2574
  • [25] Spatiotemporal consistency enhancement self-supervised representation learning for action recognition
    Bi, Shuai
    Hu, Zhengping
    Zhao, Mengyao
    Li, Shufang
    Sun, Zhe
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1485 - 1492
  • [26] Spatiotemporal consistency enhancement self-supervised representation learning for action recognition
    Shuai Bi
    Zhengping Hu
    Mengyao Zhao
    Shufang Li
    Zhe Sun
    Signal, Image and Video Processing, 2023, 17 : 1485 - 1492
  • [27] Self-supervised Spatiotemporal Learning via Video Clip Order Prediction
    Xu, Dejing
    Xiao, Jun
    Zhao, Zhou
    Shao, Jian
    Xie, Di
    Zhuang, Yueting
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10326 - 10335
  • [28] Actor-Aware Self-Supervised Learning for Semi-Supervised Video Representation Learning
    Assefa, Maregu
    Jiang, Wei
    Alemu, Kumie Gedamu
    Yilma, Getinet
    Adhikari, Deepak
    Ayalew, Melese
    Seid, Abegaz Mohammed
    Erbad, Aiman
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) : 6679 - 6692
  • [29] Enhancing motion visual cues for self-supervised video representation learning
    Nie, Mu
    Quan, Zhibin
    Ding, Weiping
    Yang, Wankou
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [30] TCGL: Temporal Contrastive Graph for Self-Supervised Video Representation Learning
    Liu, Yang
    Wang, Keze
    Liu, Lingbo
    Lan, Haoyuan
    Lin, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1978 - 1993