Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw

被引:0
|
作者
Huo, Yuqi [1 ,2 ]
Ding, Mingyu [3 ]
Lu, Haoyu [1 ]
Huang, Ziyuan [4 ]
Tang, Mingqian [5 ]
Lu, Zhiwu [2 ]
Xiang, Tao [6 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
[2] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
[3] Univ Hong Kong, Hong Kong, Peoples R China
[4] Natl Univ Singapore, Singapore, Singapore
[5] Alibaba Grp, Hangzhou, Peoples R China
[6] Univ Surrey, Surrey, England
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a novel pretext task for self-supervised video representation learning by exploiting spatiotemporal continuity in videos. It is motivated by the fact that videos are spatiotemporal by nature and a representation learned by detecting spatiotemporal continuity/discontinuity is thus beneficial for downstream video content analysis tasks. A natural choice of such a pretext task is to construct spatiotemporal (3D) jigsaw puzzles and learn to solve them. However, as we demonstrate in the experiments, this task turns out to be intractable. We thus propose Constrained Spatiotemporal Jigsaw (CSJ) whereby the 3D jigsaws are formed in a constrained manner to ensure that large continuous spatiotemporal cuboids exist. This provides sufficient cues for the model to reason about the continuity. Instead of solving them directly, which could still be extremely hard, we carefully design four surrogate tasks that are more solvable. The four tasks aim to learn representations sensitive to spatiotemporal continuity at both the local and global levels. Extensive experiments show that our CSJ achieves state-of-the-art on various benchmarks.
引用
下载
收藏
页码:751 / 757
页数:7
相关论文
共 50 条
  • [41] Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles
    Kim, Dahun
    Cho, Donghyeon
    Kweon, In So
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8545 - 8552
  • [42] Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning
    Gan, Chuang
    Gong, Boqing
    Liu, Kun
    Su, Hao
    Guibas, Leonidas J.
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5589 - 5597
  • [43] ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency
    Huang, Deng
    Wu, Wenhao
    Hu, Weiwen
    Liu, Xu
    He, Dongliang
    Wu, Zhihua
    Wu, Xiangmiao
    Tan, Mingkui
    Ding, Errui
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8076 - 8085
  • [44] Cut-in maneuver detection with self-supervised contrastive video representation learning
    Nalcakan, Yagiz
    Bastanlar, Yalin
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (06) : 2915 - 2923
  • [45] Self-Supervised Video Representation Learning by Uncovering Spatio-Temporal Statistics
    Wang, Jiangliu
    Jiao, Jianbo
    Bao, Linchao
    He, Shengfeng
    Liu, Wei
    Liu, Yun-hui
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) : 3791 - 3806
  • [46] Self-Supervised Multi-Label Transformation Prediction for Video Representation Learning
    Assefa, Maregu
    Jiang, Wei
    Yilma, Getinet
    Kumeda, Bulbula
    Ayalew, Melese
    Seid, Mohammed
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2022, 31 (09)
  • [47] Cross-View Temporal Contrastive Learning for Self-Supervised Video Representation
    Wang, Lulu
    Xu, Zengmin
    Zhang, Xuelian
    Meng, Ruxing
    Lu, Tao
    Computer Engineering and Applications, 2024, 60 (18) : 158 - 166
  • [48] Attentive spatial-temporal contrastive learning for self-supervised video representation
    Yang, Xingming
    Xiong, Sixuan
    Wu, Kewei
    Shan, Dongfeng
    Xie, Zhao
    IMAGE AND VISION COMPUTING, 2023, 137
  • [49] GOCA: Guided Online Cluster Assignment for Self-supervised Video Representation Learning
    Coskun, Huseyin
    Zareian, Alireza
    Moore, Joshua L.
    Tombari, Federico
    Wang, Chen
    COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 1 - 22
  • [50] Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
    Toering, Martine
    Gatopoulos, Ioannis
    Stol, Maarten
    Hu, Vincent Tao
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 846 - 856