Multi-Trusted Cross-Modal Information Bottleneck for 3D self-supervised representation learning

被引:0
|
作者
Cheng, Haozhe [1 ]
Han, Xu [1 ]
Shi, Pengcheng [1 ]
Zhu, Jihua [1 ]
Li, Zhongyu [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
关键词
3D point cloud; Self-supervised network; Representation learning;
D O I
10.1016/j.knosys.2023.111217
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mainstream 2D-3D multi-modal contrastive learning methods perform similarity clustering on extracted features of different modality data, such as color and spatial coordinates, to capture modality representation. However, the basic difference and noise of the data determine that not all information is beneficial to the contrastive task and may result in overfitting. Furthermore, improper fusion of multi-modal representations undermines information integrity. To address these challenges, this paper proposes a new 3D self-supervised contrastive learning method called Multi-Trusted Cross-Modal Information Bottleneck (MCIB), which filters out irrelevant information and fuses multi-modal features guided by belief and uncertainty. On one hand, Multi -Modal Information Bottleneck (MMIB) suppresses useless information that disturbs contrast by defining the lower bound of information propagation, which improves representation robustness and alleviates overfitting. On the other hand, Multi-Trusted Contrastive Learning (MTCL) regards the filtered descriptors as the trusted evidences, and then the uncertainty represented by each modality is explored by the Dirichlet distribution transformation. After that, Dempster-Shafer theory integrates the probability distribution of multi-modal representation according to belief and uncertainty, and the trusted contrastive clustering will be achieved. Empirical experiments, ablation studies, confirmatory experiments and robustness testing on public datasets and different backbones have confirmed the exceptional performance and robustness of MCIB and its sub-methods, MMIB and MTCL, in object and few-shot classification, and part segmentation.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Multi-Trusted Cross-Modal Information Bottleneck for 3D self-supervised representation learning
    Cheng, Haozhe
    Han, Xu
    Shi, Pengcheng
    Zhu, Jihua
    Li, Zhongyu
    [J]. Knowledge-Based Systems, 2024, 283
  • [2] Trusted 3D self-supervised representation learning with cross-modal settings
    Han, Xu
    Cheng, Haozhe
    Shi, Pengcheng
    Zhu, Jihua
    [J]. MACHINE VISION AND APPLICATIONS, 2024, 35 (04)
  • [3] CMD: Self-supervised 3D Action Representation Learning with Cross-Modal Mutual Distillation
    Mao, Yunyao
    Zhou, Wengang
    Lu, Zhenbo
    Deng, Jiajun
    Li, Houqiang
    [J]. COMPUTER VISION - ECCV 2022, PT III, 2022, 13663 : 734 - 752
  • [4] Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning
    Das, Srijan
    Ryoo, Michael
    [J]. 2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
  • [5] Self-supervised Exclusive Learning for 3D Segmentation with Cross-modal Unsupervised Domain Adaptation
    Zhang, Yachao
    Li, Miaoyu
    Xie, Yuan
    Li, Cuihua
    Wang, Cong
    Zhang, Zhizhong
    Qu, Yanyun
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3338 - 3346
  • [6] CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
    Afham, Mohamed
    Dissanayake, Isuru
    Dissanayake, Dinithi
    Dharmasiri, Amaya
    Thilakarathna, Kanchana
    Rodrigo, Ranga
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9892 - 9902
  • [7] Self-Supervised Correlation Learning for Cross-Modal Retrieval
    Liu, Yaxin
    Wu, Jianlong
    Qu, Leigang
    Gan, Tian
    Yin, Jianhua
    Nie, Liqiang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2851 - 2863
  • [8] Self-Supervised Graph Representation Learning via Information Bottleneck
    Gu, Junhua
    Zheng, Zichen
    Zhou, Wenmiao
    Zhang, Yajuan
    Lu, Zhengjun
    Yang, Liang
    [J]. SYMMETRY-BASEL, 2022, 14 (04):
  • [9] Cross-modal self-supervised representation learning for gesture and skill recognition in robotic surgery
    Wu, Jie Ying
    Tamhane, Aniruddha
    Kazanzides, Peter
    Unberath, Mathias
    [J]. INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2021, 16 (05) : 779 - 787
  • [10] Cross-modal self-supervised representation learning for gesture and skill recognition in robotic surgery
    Jie Ying Wu
    Aniruddha Tamhane
    Peter Kazanzides
    Mathias Unberath
    [J]. International Journal of Computer Assisted Radiology and Surgery, 2021, 16 : 779 - 787