SSRL: Self-Supervised Spatial-Temporal Representation Learning for 3D Action Recognition

被引:1
|
作者
Jin, Zhihao [1 ]
Wang, Yifan [1 ]
Wang, Qicong [1 ]
Shen, Yehu [2 ]
Meng, Hongying [3 ]
机构
[1] Xiamen Univ, Dept Comp Sci & Technol, Xiamen 361000, Peoples R China
[2] Suzhou Univ Sci & Technol, Coll Mech Engn, Suzhou 215009, Peoples R China
[3] Brunel Univ London, Dept Elect & Elect Engn, Uxbridge UB8 3PH, England
关键词
Self-supervised learning; contrastive learning; skeleton action recognition; NETWORKS; LSTM;
D O I
10.1109/TCSVT.2023.3284493
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
For 3D action recognition, the main challenge is to extract long-range semantic information in both temporal and spatial dimensions. In this paper, in order to better excavate long-range semantic information from large number of unlabelled skeleton sequences, we propose Self-supervised Spatial-temporal Representation Learning (SSRL), a contrastive learning framework to learn skeleton representation. SSRL consists of two novel inference tasks that enable the network to learn global semantic information in the temporal and spatial dimensions, respectively. The temporal inference task learns the temporal persistence of human actions through temporally incomplete skeleton sequences. And the spatial inference task learns the spatially coordinated nature of human action through spatially partially skeleton sequence. We design two transformation modules to efficiently realize these two tasks while fitting the encoder network. To avoid the difficulty of constructing and maintaining high-quality negative samples, our proposed framework learns by maintaining consistency among positive samples without the need of any negative sample. Experiments demonstrate that our proposed method can achieve better results in comparison with state-of-the-art methods under a variety of evaluation protocols on NTU RGB+D 60, PKU-MMD and NTU RGB+D 120 datasets.
引用
收藏
页码:274 / 285
页数:12
相关论文
共 50 条
  • [1] Self-Supervised Representation Learning With Spatial-Temporal Consistency for Sign Language Recognition
    Zhao, Weichao
    Zhou, Wengang
    Hu, Hezhen
    Wang, Min
    Li, Houqiang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 4188 - 4201
  • [2] Attentive spatial-temporal contrastive learning for self-supervised video representation
    Yang, Xingming
    Xiong, Sixuan
    Wu, Kewei
    Shan, Dongfeng
    Xie, Zhao
    [J]. IMAGE AND VISION COMPUTING, 2023, 137
  • [3] Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised Video Representation Learning
    Zhang, Zehua
    Crandall, David
    [J]. 2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 975 - 985
  • [4] Spatial-temporal 3D dependency matching with self-supervised deep learning for monocular visual sensing
    Song, Chengqun
    Niu, Maolong
    Liu, Zhaopeng
    Cheng, Jun
    Wang, Peng
    Li, Hongjian
    Hao, Luoying
    [J]. NEUROCOMPUTING, 2022, 481 : 11 - 21
  • [5] SELF-SUPERVISED 3D SKELETON REPRESENTATION LEARNING WITH ACTIVE SAMPLING AND ADAPTIVE RELABELING FOR ACTION RECOGNITION
    Wang, Guoquan
    Liu, Hong
    Guo, Tianyu
    Guo, Jingwen
    Wang, Ti
    Li, Yidi
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 56 - 60
  • [6] Self-Supervised 3D Action Representation Learning With Skeleton Cloud Colorization
    Yang, Siyuan
    Liu, Jun
    Lu, Shijian
    Hwa, Er Meng
    Hu, Yongjian
    Kot, Alex C.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (01) : 509 - 524
  • [7] Modeling the Uncertainty for Self-supervised 3D Skeleton Action Representation Learning
    Su, Yukun
    Lin, Guosheng
    Sun, Ruizhou
    Hao, Yun
    Wu, Qingyao
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 769 - 778
  • [8] Spatial-Temporal Asynchronous Normalization for Unsupervised 3D Action Representation Learning
    Liu, Mengyuan
    Bao, Youneng
    Liang, Yongsheng
    Meng, Fanyang
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 632 - 636
  • [9] Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds
    Huang, Siyuan
    Degrees, Yichen Xie
    Zhu, Song-Chun
    Zhu, Yixin
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6515 - 6525
  • [10] Spatial-Temporal Hypergraph Self-Supervised Learning for Crime Prediction
    Li, Zhonghang
    Huang, Chao
    Xia, Lianghao
    Xu, Yong
    Pei, Jian
    [J]. 2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2984 - 2996