Spatiotemporal consistency enhancement self-supervised representation learning for action recognition

被引:1
|
作者
Bi, Shuai [1 ]
Hu, Zhengping [1 ]
Zhao, Mengyao [1 ]
Li, Shufang [1 ,2 ]
Sun, Zhe [1 ]
机构
[1] Yanshan Univ, Sch Informat Sci & Engn, West Hebei St 438, Qinhuangdao 066004, Hebei, Peoples R China
[2] Hebei Univ Environm Engn, Dept Informat Engn, Jingang Rd 8, Qinhuangdao 066102, Hebei, Peoples R China
基金
中国国家自然科学基金;
关键词
Unlabeled data; Self-supervised representation learning; Contrastive learning; Action recognition;
D O I
10.1007/s11760-022-02357-2
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Self-supervised learning has shown enormous potential in extracting valuable features from abundant unlabeled image data. However, for video, it requires models with powerful representation capabilities to exploit the rich spatiotemporal information to fully explore the internal relationships between different instances. This paper describes a novel spatiotemporal consistency enhancement self-supervised representation learning for action recognition. In contrast to typical contrastive learning methods, which merely use positive-negative pairs to learn invariant features, in this work, we design data augmentation of spatiotemporal information for feature similarity comparison. Specifically, we first extract the motion information from the video frames to keep the same action as those belonging to the original video. Further, we add static frames to these motion features to construct distracting video positive samples to mitigate the effect of irrelevant variables on model discrimination. In addition, we corrupt the sequence of video frames to generate extra categories of negative samples and distinguish them from the original frames by temporal differences. Ultimately, the learned helpful features are used for the downstream action recognition task, and the experimental results show that the method improves the recognition accuracy of the UCF101 and HMDB51 video action datasets.
引用
收藏
页码:1485 / 1492
页数:8
相关论文
共 50 条
  • [1] Spatiotemporal consistency enhancement self-supervised representation learning for action recognition
    Shuai Bi
    Zhengping Hu
    Mengyao Zhao
    Shufang Li
    Zhe Sun
    [J]. Signal, Image and Video Processing, 2023, 17 : 1485 - 1492
  • [2] SELFGAIT: A SPATIOTEMPORAL REPRESENTATION LEARNING METHOD FOR SELF-SUPERVISED GAIT RECOGNITION
    Liu, Yiqun
    Zeng, Yi
    Pu, Jian
    Shan, Hongming
    He, Peiyang
    Zhang, Junping
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2570 - 2574
  • [3] Self-supervised action representation learning from partial consistency skeleton sequences
    Lin B.
    Zhan Y.
    [J]. Neural Computing and Applications, 2024, 36 (20) : 12385 - 12395
  • [4] Self-Supervised Motion Perception for Spatiotemporal Representation Learning
    Liu, Chang
    Yao, Yuan
    Luo, Dezhao
    Zhou, Yu
    Ye, Qixiang
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 9832 - 9846
  • [5] Self-Supervised Representation Learning With Spatial-Temporal Consistency for Sign Language Recognition
    Zhao, Weichao
    Zhou, Wengang
    Hu, Hezhen
    Wang, Min
    Li, Houqiang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 4188 - 4201
  • [6] Self-Supervised Image Representation Learning with Geometric Set Consistency
    Chen, Nenglun
    Chu, Lei
    Pan, Hao
    Lu, Yan
    Wang, Wenping
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19270 - 19280
  • [7] Self-Supervised Spatiotemporal Representation Learning by Exploiting Video Continuity
    Liang, Hanwen
    Quader, Niamul
    Chi, Zhixiang
    Chen, Lizhe
    Dai, Peng
    Lu, Juwei
    Wang, Yang
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1564 - 1573
  • [8] Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw
    Huo, Yuqi
    Ding, Mingyu
    Lu, Haoyu
    Huang, Ziyuan
    Tang, Mingqian
    Lu, Zhiwu
    Xiang, Tao
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 751 - 757
  • [9] Self-supervised representation learning for surgical activity recognition
    Paysan, Daniel
    Haug, Luis
    Bajka, Michael
    Oelhafen, Markus
    Buhmann, Joachim M.
    [J]. INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2021, 16 (11) : 2037 - 2044
  • [10] Self-Supervised ECG Representation Learning for Emotion Recognition
    Sarkar, Pritam
    Etemad, Ali
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (03) : 1541 - 1554