Video representation learning by identifying spatio-temporal transformations

被引:0
|
作者
Sheng Geng
Shimin Zhao
Hu Liu
机构
[1] Shanghai Institute of Technology,
来源
Applied Intelligence | 2022年 / 52卷
关键词
Self-supervised learning; 3D video representation; Unlabelled videos; Spatio-temporal transformations;
D O I
暂无
中图分类号
学科分类号
摘要
Self-supervised learning becomes a prevalent paradigm in both image and video domains due to the difficulty in obtaining a large amount of annotated data. In this paper, we adopt the self-supervised learning paradigm and propose to learn 3D video representations by identifying spatio-temporal transformations. Specifically, we choose a set of transformations and apply them to unlabelled videos to change the spatio-temporal structure of these videos. By identifying these spatio-temporal transformations, the network learns knowledge about both spatial appearance and temporal relation of video frames. In this paper, we choose the spatio-temporal rotations as the transformations. We conduct extensive experiments to validate the effectiveness of the proposed method. After fine-tuning on action recognition benchmarks, our model yields a remarkable gain of 29.6% on UCF101 and 25.1% on HMDB51 compared with models trained from scratch, which belongs to the current advanced method.
引用
收藏
页码:6613 / 6622
页数:9
相关论文
共 50 条
  • [1] Video representation learning by identifying spatio-temporal transformations
    Geng, Sheng
    Zhao, Shimin
    Liu, Hu
    [J]. APPLIED INTELLIGENCE, 2022, 52 (06) : 6613 - 6622
  • [2] Spatio-Temporal Crop Aggregation for Video Representation Learning
    Sameni, Sepehr
    Jenni, Simon
    Favaro, Paolo
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5641 - 5651
  • [3] Learning Spatio-temporal Representation by Channel Aliasing Video Perception
    Lin, Yiqi
    Wang, Jinpeng
    Zhang, Manlin
    Ma, Andy J.
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2317 - 2325
  • [4] Video2Vec: Learning Semantic Spatio-Temporal Embeddings for Video Representation
    Hu, Sheng-Hung
    Li, Yikang
    Li, Baoxin
    [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 811 - 816
  • [5] Unsupervised Learning of Spatio-Temporal Representation with Multi-Task Learning for Video Retrieval
    Kumar, Vidit
    [J]. 2022 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2022, : 118 - 123
  • [6] Sparse Representation With Spatio-Temporal Online Dictionary Learning for Promising Video Coding
    Dai, Wenrui
    Shen, Yangmei
    Tang, Xin
    Zou, Junni
    Xiong, Hongkai
    Chen, Chang Wen
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (10) : 4580 - 4595
  • [7] Self-Supervised Video Representation Learning by Uncovering Spatio-Temporal Statistics
    Wang, Jiangliu
    Jiao, Jianbo
    Bao, Linchao
    He, Shengfeng
    Liu, Wei
    Liu, Yun-hui
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) : 3791 - 3806
  • [8] Contrastive Spatio-Temporal Pretext Learning for Self-Supervised Video Representation
    Zhang, Yujia
    Po, Lai-Man
    Xu, Xuyuan
    Liu, Mengyang
    Wang, Yexin
    Ou, Weifeng
    Zhao, Yuzhi
    Yu, Wing-Yin
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3380 - 3389
  • [9] Transformer with Spatio-Temporal Representation for Video Anomaly Detection
    Sun, Xiaohu
    Chen, Jinyi
    Shen, Xulin
    Li, Hongjun
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2022, 2022, 13813 : 213 - 222
  • [10] A probabilistic framework for spatio-temporal video representation & indexing
    Greenspan, H
    Goldberger, J
    Mayer, A
    [J]. COMPUTER VISION - ECCV 2002, PT IV, 2002, 2353 : 461 - 475