Motion Sensitive Contrastive Learning for Self-supervised Video Representation

被引:5
|
作者
Ni, Jingcheng [1 ,2 ]
Zhou, Nan [1 ,2 ]
Qin, Jie [3 ]
Wu, Qian [4 ]
Liu, Junqi [4 ]
Li, Boxun [4 ]
Huang, Di [1 ,2 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing, Peoples R China
[2] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China
[3] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
[4] MEGVII Technol, Beijing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Video representation learning; Self-supervised learning; Local motion contrastive learning; Motion differential sampling;
D O I
10.1007/978-3-031-19833-5_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contrastive learning has shown great potential in video representation learning. However, existing approaches fail to sufficiently exploit short-term motion dynamics, which are crucial to various downstream video understanding tasks. In this paper, we propose Motion Sensitive Contrastive Learning (MSCL) that injects the motion information captured by optical flows into RGB frames to strengthen feature learning. To achieve this, in addition to clip-level global contrastive learning, we develop Local Motion Contrastive Learning (LMCL) with frame-level contrastive objectives across the two modalities. Moreover, we introduce Flow Rotation Augmentation (FRA) to generate extra motion-shuffled negative samples and Motion Differential Sampling (MDS) to accurately screen training samples. Extensive experiments on standard benchmarks validate the effectiveness of the proposed method. With the commonly-used 3D ResNet-18 as the backbone, we achieve the top-1 accuracies of 91.5% on UCF101 and 50.3% on Something-Something v2 for video classification, and a 65.6% Top-1 Recall on UCF101 for video retrieval, notably improving the state of the art.
引用
收藏
页码:457 / 474
页数:18
相关论文
共 50 条
  • [1] Video Motion Perception for Self-supervised Representation Learning
    Li, Wei
    Luo, Dezhao
    Fang, Bo
    Li, Xiaoni
    Zhou, Yu
    Wang, Weiping
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 508 - 520
  • [2] TCGL: Temporal Contrastive Graph for Self-Supervised Video Representation Learning
    Liu, Yang
    Wang, Keze
    Liu, Lingbo
    Lan, Haoyuan
    Lin, Liang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1978 - 1993
  • [3] Self-Supervised Video Representation Learning with Meta-Contrastive Network
    Lin, Yuanze
    Guo, Xun
    Lu, Yan
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8219 - 8229
  • [4] Self-Supervised Facial Motion Representation Learning via Contrastive Subclips
    Sun, Zheng
    Torrie, Shad A.
    Sumsion, Andrew W.
    Lee, Dah-Jye
    [J]. ELECTRONICS, 2023, 12 (06)
  • [5] Continuous frame motion sensitive self-supervised collaborative network for video representation learning
    Bi, Shuai
    Hu, Zhengping
    Zhao, Mengyao
    Zhang, Hehao
    Di, Jirui
    Sun, Zhe
    [J]. ADVANCED ENGINEERING INFORMATICS, 2023, 56
  • [6] Self-supervised Video Representation Learning by Context and Motion Decoupling
    Huang, Lianghua
    Liu, Yu
    Wang, Bin
    Pan, Pan
    Xu, Yinghui
    Jin, Rong
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13881 - 13890
  • [7] Masked Motion Encoding for Self-Supervised Video Representation Learning
    Sun, Xinyu
    Chen, Peihao
    Chen, Liangwei
    Li, Changhao
    Li, Thomas H.
    Tan, Mingkui
    Gan, Chuang
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2235 - 2245
  • [8] Cut-in maneuver detection with self-supervised contrastive video representation learning
    Nalcakan, Yagiz
    Bastanlar, Yalin
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (06) : 2915 - 2923
  • [9] Cross-View Temporal Contrastive Learning for Self-Supervised Video Representation
    Wang, Lulu
    Xu, Zengmin
    Zhang, Xuelian
    Meng, Ruxing
    Lu, Tao
    [J]. Computer Engineering and Applications, 60 (18): : 158 - 166
  • [10] Attentive spatial-temporal contrastive learning for self-supervised video representation
    Yang, Xingming
    Xiong, Sixuan
    Wu, Kewei
    Shan, Dongfeng
    Xie, Zhao
    [J]. IMAGE AND VISION COMPUTING, 2023, 137