Masked Autoencoder for Self-Supervised Pre-training on Lidar Point Clouds

被引:16
|
作者
Hess, Georg [1 ,2 ]
Jaxing, Johan [1 ]
Svensson, Elias [1 ]
Hagerman, David [1 ]
Petersson, Christoffer [1 ,2 ]
Svensson, Lennart [1 ]
机构
[1] Chalmers Univ Technol, Gothenburg, Sweden
[2] Zenseact, Gothenburg, Sweden
基金
瑞典研究理事会;
关键词
D O I
10.1109/WACVW58289.2023.00039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Masked autoencoding has become a successful pretraining paradigm for Transformer models for text, images, and, recently, point clouds. Raw automotive datasets are suitable candidates for self-supervised pre-training as they generally are cheap to collect compared to annotations for tasks like 3D object detection (OD). However, the development of masked autoencoders for point clouds has focused solely on synthetic and indoor data. Consequently, existing methods have tailored their representations and models toward small and dense point clouds with homogeneous point densities. In this work, we study masked autoencoding for point clouds in an automotive setting, which are sparse and for which the point density can vary drastically among objects in the same scene. To this end, we propose VoxelMAE, a simple masked autoencoding pre-training scheme designed for voxel representations. We pre-train the backbone of a Transformer-based 3D object detector to reconstruct masked voxels and to distinguish between empty and non-empty voxels. Our method improves the 3D OD performance by 1.75 mAP points and 1.05 NDS on the challenging nuScenes dataset. Further, we show that by pre-training with Voxel-MAE, we require only 40% of the annotated data to outperform a randomly initialized equivalent. Code is available at https://github.com/georghess/ voxel-mae.
引用
收藏
页码:350 / 359
页数:10
相关论文
共 50 条
  • [31] UniVIP: A Unified Framework for Self-Supervised Visual Pre-training
    Li, Zhaowen
    Zhu, Yousong
    Yang, Fan
    Li, Wei
    Zhao, Chaoyang
    Chen, Yingying
    Chen, Zhiyang
    Xie, Jiahao
    Wu, Liwei
    Zhao, Rui
    Tang, Ming
    Wang, Jinqiao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14607 - 14616
  • [32] Representation Recovering for Self-Supervised Pre-training on Medical Images
    Yan, Xiangyi
    Naushad, Junayed
    Sun, Shanlin
    Han, Kun
    Tang, Hao
    Kong, Deying
    Ma, Haoyu
    You, Chenyu
    Xie, Xiaohui
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2684 - 2694
  • [33] Reducing Domain mismatch in Self-supervised speech pre-training
    Baskar, Murali Karthick
    Rosenberg, Andrew
    Ramabhadran, Bhuvana
    Zhang, Yu
    INTERSPEECH 2022, 2022, : 3028 - 3032
  • [34] Dense Contrastive Learning for Self-Supervised Visual Pre-Training
    Wang, Xinlong
    Zhang, Rufeng
    Shen, Chunhua
    Kong, Tao
    Li, Lei
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3023 - 3032
  • [35] A Self-Supervised Pre-Training Method for Chinese Spelling Correction
    Su J.
    Yu S.
    Hong X.
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2023, 51 (09): : 90 - 98
  • [36] Self-supervised VICReg pre-training for Brugada ECG detection
    Ronan, Robert
    Tarabanis, Constantine
    Chinitz, Larry
    Jankelson, Lior
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [37] Self-supervised Pre-training for Semantic Segmentation in an Indoor Scene
    Shrestha, Sulabh
    Li, Yimeng
    Kosecka, Jana
    2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 625 - 635
  • [38] Self-supervised pre-training on industrial time-series
    Biggio, Luca
    Kastanis, Iason
    2021 8TH SWISS CONFERENCE ON DATA SCIENCE, SDS, 2021, : 56 - 57
  • [39] DiT: Self-supervised Pre-training for Document Image Transformer
    Li, Junlong
    Xu, Yiheng
    Lv, Tengchao
    Cui, Lei
    Zhang, Cha
    Wei, Furu
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3530 - 3539
  • [40] CDS: Cross-Domain Self-supervised Pre-training
    Kim, Donghyun
    Saito, Kuniaki
    Oh, Tae-Hyun
    Plummer, Bryan A.
    Sclaroff, Stan
    Saenko, Kate
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9103 - 9112