Mutual Information Driven Equivariant Contrastive Learning for 3D Action Representation Learning

被引:0
|
作者
Lin, Lilang [1 ]
Zhang, Jiahang [1 ]
Liu, Jiaying [1 ]
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing 100080, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-supervised learning; Skeleton; Task analysis; Representation learning; Data models; Three-dimensional displays; Convolutional neural networks; skeleton-based action recognition; contrastive learning; LSTM;
D O I
10.1109/TIP.2024.3372451
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-supervised contrastive learning has proven to be successful for skeleton-based action recognition. For contrastive learning, data transformations are found to fundamentally affect the learned representation quality. However, traditional invariant contrastive learning is detrimental to the performance on the downstream task if the transformation carries important information for the task. In this sense, it limits the application of many data transformations in the current contrastive learning pipeline. To address these issues, we propose to utilize equivariant contrastive learning, which extends invariant contrastive learning and preserves important information. By integrating equivariant and invariant contrastive learning into a hybrid approach, the model can better leverage the motion patterns exposed by data transformations and obtain a more discriminative representation space. Specifically, a self-distillation loss is first proposed for transformed data of different intensities to fully utilize invariant transformations, especially strong invariant transformations. For equivariant transformations, we explore the potential of skeleton mixing and temporal shuffling for equivariant contrastive learning. Meanwhile, we analyze the impacts of different data transformations on the feature space in terms of two novel metrics proposed in this paper, namely, consistency and diversity. In particular, we demonstrate that equivariant learning boosts performance by alleviating the dimensional collapse problem. Experimental results on several benchmarks indicate that our method outperforms existing state-of-the-art methods.
引用
下载
收藏
页码:1883 / 1897
页数:15
相关论文
共 50 条
  • [1] Skeleton-Contrastive 3D Action Representation Learning
    Thoker, Fida Mohammad
    Doughty, Hazel
    Snoek, Cees G. M.
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1655 - 1663
  • [2] Contrastive Positive Mining for Unsupervised 3D Action Representation Learning
    Zhang, Haoyuan
    Hou, Yonghong
    Zhang, Wenjing
    Li, Wanqing
    COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 36 - 51
  • [3] Decomposed Mutual Information Estimation for Contrastive Representation Learning
    Sordoni, Alessandro
    Dziri, Nouha
    Schulz, Hannes
    Gordon, Geoff
    Bachman, Phil
    Tachet, Remi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [4] Mutual Contrastive Learning for Visual Representation Learning
    Yang, Chuanguang
    An, Zhulin
    Cai, Linhang
    Xu, Yongjun
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3045 - 3053
  • [5] Action-driven contrastive representation for reinforcement learning
    Kim, Minbeom
    Rho, Kyeongha
    Kim, Yong-duk
    Jung, Kyomin
    PLOS ONE, 2022, 17 (03):
  • [6] Time-Equivariant Contrastive Video Representation Learning
    Jenni, Simon
    Jin, Hailin
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9950 - 9960
  • [7] A Contrastive Learning Method for the Visual Representation of 3D Point Clouds
    Zhu, Feng
    Zhao, Jieyu
    Cai, Zhengyi
    ALGORITHMS, 2022, 15 (03)
  • [8] Mutual information guided 3D ResNet for self-supervised video representation learning
    Xue, Fei
    Ji, Hongbing
    Zhang, Wenbo
    IET IMAGE PROCESSING, 2020, 14 (13) : 3066 - 3075
  • [9] SPIRAL CONTRASTIVE LEARNING: AN EFFICIENT 3D REPRESENTATION LEARNING METHOD FOR UNANNOTATED CT LESIONS
    Zhai, Penghua
    Zhu, Enwei
    Wei, Xin
    Li, Jinpeng
    2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [10] CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning
    Zhang, Xitie
    Wu, Suping
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1175 - 1179