CMD: Self-supervised 3D Action Representation Learning with Cross-Modal Mutual Distillation

被引:8
|
作者
Mao, Yunyao [1 ]
Zhou, Wengang [1 ,2 ]
Lu, Zhenbo [2 ]
Deng, Jiajun [1 ]
Li, Houqiang [1 ,2 ]
机构
[1] Univ Sci & Technol China, EEIS Dept, CAS Key Lab Technol GIPAS, Hefei, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Self-supervised 3D action recognition; Contrastive learning;
D O I
10.1007/978-3-031-20062-5_42
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In 3D action recognition, there exists rich complementary information between skeleton modalities. Nevertheless, how to model and utilize this information remains a challenging problem for self-supervised 3D action representation learning. In this work, we formulate the cross-modal interaction as a bidirectional knowledge distillation problem. Different from classic distillation solutions that transfer the knowledge of a fixed and pre-trained teacher to the student, in this work, the knowledge is continuously updated and bidirectionally distilled between modalities. To this end, we propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs. On the one hand, the neighboring similarity distribution is introduced to model the knowledge learned in each modality, where the relational information is naturally suitable for the contrastive frameworks. On the other hand, asymmetrical configurations are used for teacher and student to stabilize the distillation process and to transfer high-confidence information between modalities. By derivation, we find that the cross-modal positive mining in previous works can be regarded as a degenerated version of our CMD. We perform extensive experiments on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD II datasets. Our approach outperforms existing self-supervised methods and sets a series of new records. The code is available at: https://github.com/maoyunyao/CMD.
引用
收藏
页码:734 / 752
页数:19
相关论文
共 50 条
  • [1] Trusted 3D self-supervised representation learning with cross-modal settings
    Han, Xu
    Cheng, Haozhe
    Shi, Pengcheng
    Zhu, Jihua
    [J]. MACHINE VISION AND APPLICATIONS, 2024, 35 (04)
  • [2] Multi-Trusted Cross-Modal Information Bottleneck for 3D self-supervised representation learning
    Cheng, Haozhe
    Han, Xu
    Shi, Pengcheng
    Zhu, Jihua
    Li, Zhongyu
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 283
  • [3] Multi-Trusted Cross-Modal Information Bottleneck for 3D self-supervised representation learning
    Cheng, Haozhe
    Han, Xu
    Shi, Pengcheng
    Zhu, Jihua
    Li, Zhongyu
    [J]. Knowledge-Based Systems, 2024, 283
  • [4] Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning
    Das, Srijan
    Ryoo, Michael
    [J]. 2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
  • [5] Learning Mutual Modulation for Self-supervised Cross-Modal Super-Resolution
    Dong, Xiaoyu
    Yokoya, Naoto
    Wang, Longguang
    Uezato, Tatsumi
    [J]. COMPUTER VISION, ECCV 2022, PT XIX, 2022, 13679 : 1 - 18
  • [6] Self-supervised Exclusive Learning for 3D Segmentation with Cross-modal Unsupervised Domain Adaptation
    Zhang, Yachao
    Li, Miaoyu
    Xie, Yuan
    Li, Cuihua
    Wang, Cong
    Zhang, Zhizhong
    Qu, Yanyun
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3338 - 3346
  • [7] CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
    Afham, Mohamed
    Dissanayake, Isuru
    Dissanayake, Dinithi
    Dharmasiri, Amaya
    Thilakarathna, Kanchana
    Rodrigo, Ranga
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9892 - 9902
  • [8] Self-Supervised Cross-Modal Distillation for Thermal Infrared Tracking
    Zha, Yufei
    Sun, Jingxian
    Zhang, Peng
    Zhang, Lichao
    Gonzalez-Garcia, Abel
    Huang, Wei
    [J]. IEEE MULTIMEDIA, 2022, 29 (04) : 80 - 96
  • [9] Self-Supervised Correlation Learning for Cross-Modal Retrieval
    Liu, Yaxin
    Wu, Jianlong
    Qu, Leigang
    Gan, Tian
    Yin, Jianhua
    Nie, Liqiang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2851 - 2863
  • [10] Cross-modal self-supervised representation learning for gesture and skill recognition in robotic surgery
    Wu, Jie Ying
    Tamhane, Aniruddha
    Kazanzides, Peter
    Unberath, Mathias
    [J]. INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2021, 16 (05) : 779 - 787