Multi-Trusted Cross-Modal Information Bottleneck for 3D self-supervised representation learning

被引:0
|
作者
Cheng, Haozhe [1 ]
Han, Xu [1 ]
Shi, Pengcheng [1 ]
Zhu, Jihua [1 ]
Li, Zhongyu [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
关键词
3D point cloud; Self-supervised network; Representation learning;
D O I
10.1016/j.knosys.2023.111217
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mainstream 2D-3D multi-modal contrastive learning methods perform similarity clustering on extracted features of different modality data, such as color and spatial coordinates, to capture modality representation. However, the basic difference and noise of the data determine that not all information is beneficial to the contrastive task and may result in overfitting. Furthermore, improper fusion of multi-modal representations undermines information integrity. To address these challenges, this paper proposes a new 3D self-supervised contrastive learning method called Multi-Trusted Cross-Modal Information Bottleneck (MCIB), which filters out irrelevant information and fuses multi-modal features guided by belief and uncertainty. On one hand, Multi -Modal Information Bottleneck (MMIB) suppresses useless information that disturbs contrast by defining the lower bound of information propagation, which improves representation robustness and alleviates overfitting. On the other hand, Multi-Trusted Contrastive Learning (MTCL) regards the filtered descriptors as the trusted evidences, and then the uncertainty represented by each modality is explored by the Dirichlet distribution transformation. After that, Dempster-Shafer theory integrates the probability distribution of multi-modal representation according to belief and uncertainty, and the trusted contrastive clustering will be achieved. Empirical experiments, ablation studies, confirmatory experiments and robustness testing on public datasets and different backbones have confirmed the exceptional performance and robustness of MCIB and its sub-methods, MMIB and MTCL, in object and few-shot classification, and part segmentation.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Autoencoder-based self-supervised hashing for cross-modal retrieval
    Yifan Li
    Xuan Wang
    Lei Cui
    Jiajia Zhang
    Chengkai Huang
    Xuan Luo
    Shuhan Qi
    [J]. Multimedia Tools and Applications, 2021, 80 : 17257 - 17274
  • [42] Self-supervised cross-modal visual retrieval from brain activities
    Ye, Zesheng
    Yao, Lina
    Zhang, Yu
    Gustin, Sylvia
    [J]. PATTERN RECOGNITION, 2024, 145
  • [43] Self-supervised learning-based weight adaptive hashing for fast cross-modal retrieval
    Yifan Li
    Xuan Wang
    Shuhan Qi
    Chengkai Huang
    Zoe. L Jiang
    Qing Liao
    Jian Guan
    Jiajia Zhang
    [J]. Signal, Image and Video Processing, 2021, 15 : 673 - 680
  • [44] Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
    Wang, Xin
    Huang, Qiuyuan
    Celikyilmaz, Asli
    Gao, Jianfeng
    Shen, Dinghan
    Wang, Yuan-Fang
    Wang, William Yang
    Zhang, Lei
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3622 - 6631
  • [45] CroSSL: Cross-modal Self-Supervised Learning for Time-series through Latent Masking
    Deldari, Shohreh
    Spathis, Dimitris
    Malekzadeh, Mohammad
    Kawsar, Fahim
    Salim, Flora D.
    Mathur, Akhil
    [J]. PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 152 - 160
  • [46] Self-Supervised Cross-Modal Online Learning of Basic Object Affordances for Developmental Robotic Systems
    Ridge, Barry
    Skocaj, Danijel
    Leonardis, Ales
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2010, : 5047 - 5054
  • [47] Self-supervised learning-based weight adaptive hashing for fast cross-modal retrieval
    Li, Yifan
    Wang, Xuan
    Qi, Shuhan
    Huang, Chengkai
    Jiang, Zoe L.
    Liao, Qing
    Guan, Jian
    Zhang, Jiajia
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2021, 15 (04) : 673 - 680
  • [48] Self-Supervised 3D Behavior Representation Learning Based on Homotopic Hyperbolic Embedding
    Chen, Jinghong
    Jin, Zhihao
    Wang, Qicong
    Meng, Hongying
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 6061 - 6074
  • [49] Self-Supervised 3D Behavior Representation Learning Based on Homotopic Hyperbolic Embedding
    Chen, Jinghong
    Jin, Zhihao
    Wang, Qicong
    Meng, Hongying
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 6061 - 6074
  • [50] Self-supervised 3D Skeleton Action Representation Learning with Motion Consistency and Continuity
    Su, Yukun
    Lin, Guosheng
    Wu, Qingyao
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13308 - 13318