Masked Autoencoder for Self-Supervised Pre-training on Lidar Point Clouds

被引:16
|
作者
Hess, Georg [1 ,2 ]
Jaxing, Johan [1 ]
Svensson, Elias [1 ]
Hagerman, David [1 ]
Petersson, Christoffer [1 ,2 ]
Svensson, Lennart [1 ]
机构
[1] Chalmers Univ Technol, Gothenburg, Sweden
[2] Zenseact, Gothenburg, Sweden
基金
瑞典研究理事会;
关键词
D O I
10.1109/WACVW58289.2023.00039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Masked autoencoding has become a successful pretraining paradigm for Transformer models for text, images, and, recently, point clouds. Raw automotive datasets are suitable candidates for self-supervised pre-training as they generally are cheap to collect compared to annotations for tasks like 3D object detection (OD). However, the development of masked autoencoders for point clouds has focused solely on synthetic and indoor data. Consequently, existing methods have tailored their representations and models toward small and dense point clouds with homogeneous point densities. In this work, we study masked autoencoding for point clouds in an automotive setting, which are sparse and for which the point density can vary drastically among objects in the same scene. To this end, we propose VoxelMAE, a simple masked autoencoding pre-training scheme designed for voxel representations. We pre-train the backbone of a Transformer-based 3D object detector to reconstruct masked voxels and to distinguish between empty and non-empty voxels. Our method improves the 3D OD performance by 1.75 mAP points and 1.05 NDS on the challenging nuScenes dataset. Further, we show that by pre-training with Voxel-MAE, we require only 40% of the annotated data to outperform a randomly initialized equivalent. Code is available at https://github.com/georghess/ voxel-mae.
引用
收藏
页码:350 / 359
页数:10
相关论文
共 50 条
  • [1] Masked Feature Prediction for Self-Supervised Visual Pre-Training
    Wei, Chen
    Fan, Haoqi
    Xie, Saining
    Wu, Chao-Yuan
    Yuille, Alan
    Feichtenhofer, Christoph
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14648 - 14658
  • [2] GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training
    Tian, Xiaoyu
    Ran, Haoxi
    Wang, Yue
    Zhao, Hang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13570 - 13580
  • [3] Inter-Modal Masked Autoencoder for Self-Supervised Learning on Point Clouds
    Liu, Jiaming
    Wu, Yue
    Gong, Maoguo
    Liu, Zhixiao
    Miao, Qiguang
    Ma, Wenping
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3897 - 3908
  • [4] GO-MAE: Self-supervised pre-training via masked autoencoder for OCT image classification of gynecology
    Wang, Haoran
    Guo, Xinyu
    Song, Kaiwen
    Sun, Mingyang
    Shao, Yanbin
    Xue, Songfeng
    Zhang, Hongwei
    Zhang, Tianyu
    NEURAL NETWORKS, 2025, 181
  • [5] Self-supervised ECG pre-training
    Liu, Han
    Zhao, Zhenbo
    She, Qiang
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 70
  • [6] MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
    Xu, Runsen
    Wang, Tai
    Zhang, Wenwei
    Chen, Runjian
    Cao, Jinkun
    Pang, Jiangmiao
    Lin, Dahua
    arXiv, 2023,
  • [7] The Devil Is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training
    Liu, Hao
    Jiang, Xinghua
    Li, Xin
    Guo, Antai
    Hu, Yiqing
    Jiang, Deqiang
    Ren, Bo
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1649 - 1656
  • [8] Masked Deformation Modeling for Volumetric Brain MRI Self-Supervised Pre-Training
    Lyu, Junyan
    Bartlett, Perry F.
    Nasrallah, Fatima A.
    Tang, Xiaoying
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2025, 44 (03) : 1596 - 1607
  • [9] MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
    Xu, Runsen
    Wang, Tai
    Zhang, Wenwei
    Chen, Runjian
    Cao, Jinkun
    Pang, Jiangmiao
    Lin, Dahua
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13445 - 13454
  • [10] MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
    Xu, Runsen
    Wang, Tai
    Zhang, Wenwei
    Chen, Runjian
    Cao, Jinkun
    Pang, Jiangmiao
    Lin, Dahua
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023, 2023-June : 13445 - 13454