Multi-Trusted Cross-Modal Information Bottleneck for 3D self-supervised representation learning

被引:0
|
作者
Cheng, Haozhe [1 ]
Han, Xu [1 ]
Shi, Pengcheng [1 ]
Zhu, Jihua [1 ]
Li, Zhongyu [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
关键词
3D point cloud; Self-supervised network; Representation learning;
D O I
10.1016/j.knosys.2023.111217
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mainstream 2D-3D multi-modal contrastive learning methods perform similarity clustering on extracted features of different modality data, such as color and spatial coordinates, to capture modality representation. However, the basic difference and noise of the data determine that not all information is beneficial to the contrastive task and may result in overfitting. Furthermore, improper fusion of multi-modal representations undermines information integrity. To address these challenges, this paper proposes a new 3D self-supervised contrastive learning method called Multi-Trusted Cross-Modal Information Bottleneck (MCIB), which filters out irrelevant information and fuses multi-modal features guided by belief and uncertainty. On one hand, Multi -Modal Information Bottleneck (MMIB) suppresses useless information that disturbs contrast by defining the lower bound of information propagation, which improves representation robustness and alleviates overfitting. On the other hand, Multi-Trusted Contrastive Learning (MTCL) regards the filtered descriptors as the trusted evidences, and then the uncertainty represented by each modality is explored by the Dirichlet distribution transformation. After that, Dempster-Shafer theory integrates the probability distribution of multi-modal representation according to belief and uncertainty, and the trusted contrastive clustering will be achieved. Empirical experiments, ablation studies, confirmatory experiments and robustness testing on public datasets and different backbones have confirmed the exceptional performance and robustness of MCIB and its sub-methods, MMIB and MTCL, in object and few-shot classification, and part segmentation.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Self-Supervised Multi-Modal Knowledge Graph Contrastive Hashing for Cross-Modal Search
    Liang, Meiyu
    Du, Junping
    Liang, Zhengyang
    Xing, Yongwang
    Huang, Wei
    Xue, Zhe
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13744 - 13753
  • [22] Self-Supervised Intra-Modal and Cross-Modal Contrastive Learning for Point Cloud Understanding
    Wu, Yue
    Liu, Jiaming
    Gong, Maoguo
    Gong, Peiran
    Fan, Xiaolong
    Qin, A. K.
    Miao, Qiguang
    Ma, Wenping
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1626 - 1638
  • [23] A SELF-SUPERVISED CROSS-MODAL REMOTE SENSING FOUNDATION MODEL WITH MULTI-DOMAIN REPRESENTATION AND CROSS-DOMAIN FUSION
    Feng, Yingchao
    Wang, Peijin
    Diao, Wenhui
    He, Qibin
    Hu, Huiyang
    Bi, Hanbo
    Sun, Xian
    Fu, Kun
    [J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 2239 - 2242
  • [24] Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval
    Chung, Soo-Whan
    Chung, Joon Son
    Kang, Hong-Goo
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (03) : 568 - 576
  • [25] Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval
    Li, Chao
    Deng, Cheng
    Li, Ning
    Liu, Wei
    Gao, Xinbo
    Tao, Dacheng
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4242 - 4251
  • [26] Self-Supervised Cross-Modal Distillation for Thermal Infrared Tracking
    Zha, Yufei
    Sun, Jingxian
    Zhang, Peng
    Zhang, Lichao
    Gonzalez-Garcia, Abel
    Huang, Wei
    [J]. IEEE MULTIMEDIA, 2022, 29 (04) : 80 - 96
  • [27] Multi-label enhancement based self-supervised deep cross-modal hashing
    Zou, Xitao
    Wu, Song
    Bakker, Erwin M.
    Wang, Xinzhi
    [J]. NEUROCOMPUTING, 2022, 467 : 138 - 162
  • [28] Multi-label enhancement based self-supervised deep cross-modal hashing
    Zou, Xitao
    Wu, Song
    Bakker, Erwin M.
    Wang, Xinzhi
    [J]. Neurocomputing, 2022, 467 : 138 - 162
  • [29] Self-supervised Secondary Landmark Detection via 3D Representation Learning
    Bala, Praneet
    Zimmermann, Jan
    Park, Hyun Soo
    Hayden, Benjamin Y.
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (08) : 1980 - 1994
  • [30] Self-Supervised 3D Action Representation Learning With Skeleton Cloud Colorization
    Yang, Siyuan
    Liu, Jun
    Lu, Shijian
    Hwa, Er Meng
    Hu, Yongjian
    Kot, Alex C.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (01) : 509 - 524