Masked Autoencoder for Self-Supervised Pre-training on Lidar Point Clouds

被引:16
|
作者
Hess, Georg [1 ,2 ]
Jaxing, Johan [1 ]
Svensson, Elias [1 ]
Hagerman, David [1 ]
Petersson, Christoffer [1 ,2 ]
Svensson, Lennart [1 ]
机构
[1] Chalmers Univ Technol, Gothenburg, Sweden
[2] Zenseact, Gothenburg, Sweden
基金
瑞典研究理事会;
关键词
D O I
10.1109/WACVW58289.2023.00039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Masked autoencoding has become a successful pretraining paradigm for Transformer models for text, images, and, recently, point clouds. Raw automotive datasets are suitable candidates for self-supervised pre-training as they generally are cheap to collect compared to annotations for tasks like 3D object detection (OD). However, the development of masked autoencoders for point clouds has focused solely on synthetic and indoor data. Consequently, existing methods have tailored their representations and models toward small and dense point clouds with homogeneous point densities. In this work, we study masked autoencoding for point clouds in an automotive setting, which are sparse and for which the point density can vary drastically among objects in the same scene. To this end, we propose VoxelMAE, a simple masked autoencoding pre-training scheme designed for voxel representations. We pre-train the backbone of a Transformer-based 3D object detector to reconstruct masked voxels and to distinguish between empty and non-empty voxels. Our method improves the 3D OD performance by 1.75 mAP points and 1.05 NDS on the challenging nuScenes dataset. Further, we show that by pre-training with Voxel-MAE, we require only 40% of the annotated data to outperform a randomly initialized equivalent. Code is available at https://github.com/georghess/ voxel-mae.
引用
收藏
页码:350 / 359
页数:10
相关论文
共 50 条
  • [21] Masked Text Modeling: A Self-Supervised Pre-training Method for Scene Text Detection
    Wang, Keran
    Xie, Hongtao
    Wang, Yuxin
    Zhang, Dongming
    Qu, Yadong
    Gao, Zuan
    Zhang, Yongdong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2006 - 2015
  • [22] Forecast-MAE: Self-supervised Pre-training for Motion Forecasting with Masked Autoencoders
    Cheng, Jie
    Mei, Xiaodong
    Liu, Ming
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8645 - 8655
  • [23] Self-supervised Pre-training for Dealing with Small Datasets in Deep Learning for Medical Imaging Evaluation of Contrastive and Masked Autoencoder Methods
    Wolf, Daniel
    Payer, Tristan
    Lisson, Catharina S.
    Lisson, Christoph G.
    Beer, Meinrad
    Goetz, Michael
    Ropinski, Timo
    BILDVERARBEITUNG FUR DIE MEDIZIN 2024, 2024, : 157 - 157
  • [24] Self-Supervised Pre-Training for 3-D Roof Reconstruction on LiDAR Data
    Yang, Hongxin
    Huang, Shangfeng
    Wang, Ruisheng
    Wang, Xin
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [25] Mutual information-driven self-supervised point cloud pre-training
    Xu, Weichen
    Fu, Tianhao
    Cao, Jian
    Zhao, Xinyu
    Xu, Xinxin
    Cao, Xixin
    Zhang, Xing
    KNOWLEDGE-BASED SYSTEMS, 2025, 307
  • [26] Self-supervised Pre-training with Masked Shape Prediction for 3D Scene Understanding
    Jiang, Li
    Yang, Zetong
    Shi, Shaoshuai
    Golyanik, Vladislav
    Dai, Dengxin
    Schiele, Bernt
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1168 - 1178
  • [27] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
    Tong, Zhan
    Song, Yibing
    Wang, Jue
    Wang, Limin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [28] Self-Supervised Pre-training for Time Series Classification
    Shi, Pengxiang
    Ye, Wenwen
    Qin, Zheng
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [29] LBS Autoencoder: Self-supervised Fitting of Articulated Meshes to Point Clouds
    Li, Chun-Liang
    Simon, Tomas
    Saragih, Jason
    Poczos, Barnabas
    Sheikh, Yaser
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11959 - 11968
  • [30] Object Adaptive Self-Supervised Dense Visual Pre-Training
    Zhang, Yu
    Zhang, Tao
    Zhu, Hongyuan
    Chen, Zihan
    Mi, Siya
    Peng, Xi
    Geng, Xin
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 2228 - 2240