Self-supervised Pre-training with Masked Shape Prediction for 3D Scene Understanding

被引:1
|
作者
Jiang, Li [1 ]
Yang, Zetong [2 ]
Shi, Shaoshuai [1 ]
Golyanik, Vladislav [1 ]
Dai, Dengxin [1 ]
Schiele, Bernt [1 ]
机构
[1] Saarland Informatics Campus, Max Planck Inst Informat, Saarbrucken, Germany
[2] CUHK, Hong Kong, Peoples R China
关键词
D O I
10.1109/CVPR52729.2023.00119
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Masked signal modeling has greatly advanced self-supervised pre-training for language and 2D images. However, it is still not fully explored in 3D scene understanding. Thus, this paper introduces Masked Shape Prediction (MSP), a new framework to conduct masked signal modeling in 3D scenes. MSP uses the essential 3D semantic cue, i.e., geometric shape, as the prediction target for masked points. The context-enhanced shape target consisting of explicit shape context and implicit deep shape feature is proposed to facilitate exploiting contextual cues in shape prediction. Meanwhile, the pre-training architecture in MSP is carefully designed to alleviate the masked shape leakage from point coordinates. Experiments on multiple 3D understanding tasks on both indoor and outdoor datasets demonstrate the effectiveness of MSP in learning good feature representations to consistently boost downstream performance.
引用
下载
收藏
页码:1168 / 1178
页数:11
相关论文
共 50 条
  • [41] MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
    Xu, Runsen
    Wang, Tai
    Zhang, Wenwei
    Chen, Runjian
    Cao, Jinkun
    Pang, Jiangmiao
    Lin, Dahua
    arXiv, 2023,
  • [42] GO-MAE: Self-supervised pre-training via masked autoencoder for OCT image classification of gynecology
    Wang, Haoran
    Guo, Xinyu
    Song, Kaiwen
    Sun, Mingyang
    Shao, Yanbin
    Xue, Songfeng
    Zhang, Hongwei
    Zhang, Tianyu
    Neural Networks, 2025, 181
  • [43] Intra-modality masked image modeling: A self-supervised pre-training method for brain tumor segmentation
    Qi, Liangce
    Shi, Weili
    Miao, Yu
    Li, Yonghui
    Feng, Guanyuan
    Jiang, Zhengang
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 95
  • [44] MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
    Xu, Runsen
    Wang, Tai
    Zhang, Wenwei
    Chen, Runjian
    Cao, Jinkun
    Pang, Jiangmiao
    Lin, Dahua
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13445 - 13454
  • [45] MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
    Xu, Runsen
    Wang, Tai
    Zhang, Wenwei
    Chen, Runjian
    Cao, Jinkun
    Pang, Jiangmiao
    Lin, Dahua
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023, 2023-June : 13445 - 13454
  • [46] Token Boosting for Robust Self-Supervised Visual Transformer Pre-training
    Li, Tianjiao
    Foo, Lin Geng
    Hu, Ping
    Shang, Xindi
    Rahmani, Hossein
    Yuan, Zehuan
    Liu, Jun
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 24027 - 24038
  • [47] Joint Encoder-Decoder Self-Supervised Pre-training for ASR
    Arunkumar, A.
    Umesh, S.
    INTERSPEECH 2022, 2022, : 3418 - 3422
  • [48] Individualized Stress Mobile Sensing Using Self-Supervised Pre-Training
    Islam, Tanvir
    Washington, Peter
    APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [49] Self-Supervised Pre-training for Protein Embeddings Using Tertiary Structures
    Guo, Yuzhi
    Wu, Jiaxiang
    Ma, Hehuan
    Huang, Junzhou
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6801 - 6809
  • [50] Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training
    Huang, Sung-Feng
    Chuang, Shun-Po
    Liu, Da-Rong
    Chen, Yi-Chen
    Yang, Gene-Ping
    Lee, Hung-yi
    INTERSPEECH 2021, 2021, : 3056 - 3060