Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw

被引:0
|
作者
Huo, Yuqi [1 ,2 ]
Ding, Mingyu [3 ]
Lu, Haoyu [1 ]
Huang, Ziyuan [4 ]
Tang, Mingqian [5 ]
Lu, Zhiwu [2 ]
Xiang, Tao [6 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
[2] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
[3] Univ Hong Kong, Hong Kong, Peoples R China
[4] Natl Univ Singapore, Singapore, Singapore
[5] Alibaba Grp, Hangzhou, Peoples R China
[6] Univ Surrey, Surrey, England
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a novel pretext task for self-supervised video representation learning by exploiting spatiotemporal continuity in videos. It is motivated by the fact that videos are spatiotemporal by nature and a representation learned by detecting spatiotemporal continuity/discontinuity is thus beneficial for downstream video content analysis tasks. A natural choice of such a pretext task is to construct spatiotemporal (3D) jigsaw puzzles and learn to solve them. However, as we demonstrate in the experiments, this task turns out to be intractable. We thus propose Constrained Spatiotemporal Jigsaw (CSJ) whereby the 3D jigsaws are formed in a constrained manner to ensure that large continuous spatiotemporal cuboids exist. This provides sufficient cues for the model to reason about the continuity. Instead of solving them directly, which could still be extremely hard, we carefully design four surrogate tasks that are more solvable. The four tasks aim to learn representations sensitive to spatiotemporal continuity at both the local and global levels. Extensive experiments show that our CSJ achieves state-of-the-art on various benchmarks.
引用
下载
收藏
页码:751 / 757
页数:7
相关论文
共 50 条
  • [1] Self-Supervised Spatiotemporal Representation Learning by Exploiting Video Continuity
    Liang, Hanwen
    Quader, Niamul
    Chi, Zhixiang
    Chen, Lizhe
    Dai, Peng
    Lu, Juwei
    Wang, Yang
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1564 - 1573
  • [2] SELF-SUPERVISED REPRESENTATION LEARNING FOR ULTRASOUND VIDEO
    Jiao, Jianbo
    Droste, Richard
    Drukker, Lior
    Papageorghiou, Aris T.
    Noble, J. Alison
    2020 IEEE 17TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2020), 2020, : 1847 - 1850
  • [3] Self-Supervised Motion Perception for Spatiotemporal Representation Learning
    Liu, Chang
    Yao, Yuan
    Luo, Dezhao
    Zhou, Yu
    Ye, Qixiang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 9832 - 9846
  • [4] Self-Supervised Video Representation Learning by Video Incoherence Detection
    Cao, Haozhi
    Xu, Yuecong
    Mao, Kezhi
    Xie, Lihua
    Yin, Jianxiong
    See, Simon
    Xu, Qianwen
    Yang, Jianfei
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (06) : 3810 - 3822
  • [5] Video Face Clustering with Self-Supervised Representation Learning
    Sharma V.
    Tapaswi M.
    Saquib Sarfraz M.
    Stiefelhagen R.
    IEEE Transactions on Biometrics, Behavior, and Identity Science, 2020, 2 (02): : 145 - 157
  • [6] Self-Supervised Representation Learning for Video Quality Assessment
    Jiang, Shaojie
    Sang, Qingbing
    Hu, Zongyao
    Liu, Lixiong
    IEEE TRANSACTIONS ON BROADCASTING, 2023, 69 (01) : 118 - 129
  • [7] Video Motion Perception for Self-supervised Representation Learning
    Li, Wei
    Luo, Dezhao
    Fang, Bo
    Li, Xiaoni
    Zhou, Yu
    Wang, Weiping
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 508 - 520
  • [8] Motion-guided spatiotemporal multitask feature discrimination for self-supervised video representation learning
    Bi, Shuai
    Hu, Zhengping
    Zhang, Hehao
    Di, Jirui
    Sun, Zhe
    PATTERN RECOGNITION, 2024, 155
  • [9] Discriminative Spatiotemporal Alignment for Self-Supervised Video Correspondence Learning
    Wei, Qiaoqiao
    Zhang, Hui
    Yong, Jun-Hai
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1841 - 1846
  • [10] Static and Dynamic Concepts for Self-supervised Video Representation Learning
    Qian, Rui
    Ding, Shuangrui
    Liu, Xian
    Lin, Dahua
    COMPUTER VISION, ECCV 2022, PT XXVI, 2022, 13686 : 145 - 164