Video representation learning by identifying spatio-temporal transformations

被引:0
|
作者
Sheng Geng
Shimin Zhao
Hu Liu
机构
[1] Shanghai Institute of Technology,
来源
Applied Intelligence | 2022年 / 52卷
关键词
Self-supervised learning; 3D video representation; Unlabelled videos; Spatio-temporal transformations;
D O I
暂无
中图分类号
学科分类号
摘要
Self-supervised learning becomes a prevalent paradigm in both image and video domains due to the difficulty in obtaining a large amount of annotated data. In this paper, we adopt the self-supervised learning paradigm and propose to learn 3D video representations by identifying spatio-temporal transformations. Specifically, we choose a set of transformations and apply them to unlabelled videos to change the spatio-temporal structure of these videos. By identifying these spatio-temporal transformations, the network learns knowledge about both spatial appearance and temporal relation of video frames. In this paper, we choose the spatio-temporal rotations as the transformations. We conduct extensive experiments to validate the effectiveness of the proposed method. After fine-tuning on action recognition benchmarks, our model yields a remarkable gain of 29.6% on UCF101 and 25.1% on HMDB51 compared with models trained from scratch, which belongs to the current advanced method.
引用
收藏
页码:6613 / 6622
页数:9
相关论文
共 50 条
  • [31] Spatio-Temporal Graph Representation Learning for Fraudster Group Detection
    Shehnepoor, Saeedreza
    Togneri, Roberto
    Liu, Wei
    Bennamoun, Mohammed
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (05) : 6628 - 6642
  • [32] Similar Trajectory Search with Spatio-Temporal Deep Representation Learning
    Tedjopurnomo, David Alexander
    Li, Xiucheng
    Bao, Zhifeng
    Cong, Gao
    Choudhury, Farhana
    Qin, A. K.
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (06)
  • [33] Learning Feature Semantic Matching for Spatio-Temporal Video Grounding
    Zhang, Tong
    Fang, Hao
    Zhang, Hao
    Gao, Jialin
    Lu, Xiankai
    Nie, Xiushan
    Yin, Yilong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9268 - 9279
  • [34] STEP: Spatio-Temporal Progressive Learning for Video Action Detection
    Yang, Xitong
    Yang, Xiaodong
    Liu, Ming-Yu
    Xiao, Fanyi
    Davis, Larry
    Kautz, Jan
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 264 - 272
  • [35] Learning Deep Spatio-Temporal Dependence for Semantic Video Segmentation
    Qiu, Zhaofan
    Yao, Ting
    Mei, Tao
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (04) : 939 - 949
  • [36] A Spatio-Temporal Linked Data Representation for Modeling Spatio-Temporal Dialect Data
    Scholz, Johannes
    Hrastnig, Emanual
    Wandl-Vogt, Eveline
    [J]. PROCEEDINGS OF WORKSHOPS AND POSTERS AT THE 13TH INTERNATIONAL CONFERENCE ON SPATIAL INFORMATION THEORY (COSIT 2017), 2018, : 275 - 282
  • [37] The Spatio-Temporal Representation of Natural Reading
    Wehbe, Leila
    [J]. PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 4407 - 4408
  • [38] Visual representation of spatio-temporal structure
    Schill, K
    Zetzsche, C
    Brauer, W
    Eisenkolb, A
    Musto, A
    [J]. HUMAN VISION AND ELECTRONIC IMAGING III, 1998, 3299 : 128 - 138
  • [39] Topological spatio-temporal reasoning and representation
    Muller, P
    [J]. COMPUTATIONAL INTELLIGENCE, 2002, 18 (03) : 420 - 450
  • [40] Qualitative representation of spatio-temporal knowledge
    Della Penna, Giuseppe
    Orefice, Sergio
    [J]. JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2018, 49 : 1 - 16