Masked Autoencoder for Self-Supervised Pre-training on Lidar Point Clouds

被引：16

作者：

Hess, Georg ^{[1
,2
]}

Jaxing, Johan ^{[1
]}

Svensson, Elias ^{[1
]}

Hagerman, David ^{[1
]}

Petersson, Christoffer ^{[1
,2
]}

Svensson, Lennart ^{[1
]}

机构：

[1] Chalmers Univ Technol, Gothenburg, Sweden

[2] Zenseact, Gothenburg, Sweden

来源：

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW) | 2023年

基金：

瑞典研究理事会;

关键词：

D O I：

10.1109/WACVW58289.2023.00039

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Masked autoencoding has become a successful pretraining paradigm for Transformer models for text, images, and, recently, point clouds. Raw automotive datasets are suitable candidates for self-supervised pre-training as they generally are cheap to collect compared to annotations for tasks like 3D object detection (OD). However, the development of masked autoencoders for point clouds has focused solely on synthetic and indoor data. Consequently, existing methods have tailored their representations and models toward small and dense point clouds with homogeneous point densities. In this work, we study masked autoencoding for point clouds in an automotive setting, which are sparse and for which the point density can vary drastically among objects in the same scene. To this end, we propose VoxelMAE, a simple masked autoencoding pre-training scheme designed for voxel representations. We pre-train the backbone of a Transformer-based 3D object detector to reconstruct masked voxels and to distinguish between empty and non-empty voxels. Our method improves the 3D OD performance by 1.75 mAP points and 1.05 NDS on the challenging nuScenes dataset. Further, we show that by pre-training with Voxel-MAE, we require only 40% of the annotated data to outperform a randomly initialized equivalent. Code is available at https://github.com/georghess/ voxel-mae.

引用

页码：350 / 359

页数：10

共 50 条

[21] Masked Text Modeling: A Self-Supervised Pre-training Method for Scene Text Detection
Wang, Keran
Xie, Hongtao
Wang, Yuxin
Zhang, Dongming
Qu, Yadong
Gao, Zuan
Zhang, Yongdong
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2006 - 2015
[22] Forecast-MAE: Self-supervised Pre-training for Motion Forecasting with Masked Autoencoders
Cheng, Jie
Mei, Xiaodong
Liu, Ming
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8645 - 8655
[23] Self-supervised Pre-training for Dealing with Small Datasets in Deep Learning for Medical Imaging Evaluation of Contrastive and Masked Autoencoder Methods
Wolf, Daniel
Payer, Tristan
Lisson, Catharina S.
Lisson, Christoph G.
Beer, Meinrad
Goetz, Michael
Ropinski, Timo
BILDVERARBEITUNG FUR DIE MEDIZIN 2024, 2024, : 157 - 157
[24] Self-Supervised Pre-Training for 3-D Roof Reconstruction on LiDAR Data
Yang, Hongxin
Huang, Shangfeng
Wang, Ruisheng
Wang, Xin
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
[25] Mutual information-driven self-supervised point cloud pre-training
Xu, Weichen
Fu, Tianhao
Cao, Jian
Zhao, Xinyu
Xu, Xinxin
Cao, Xixin
Zhang, Xing
KNOWLEDGE-BASED SYSTEMS, 2025, 307
[26] Self-supervised Pre-training with Masked Shape Prediction for 3D Scene Understanding
Jiang, Li
Yang, Zetong
Shi, Shaoshuai
Golyanik, Vladislav
Dai, Dengxin
Schiele, Bernt
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1168 - 1178
[27] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Tong, Zhan
Song, Yibing
Wang, Jue
Wang, Limin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[28] Self-Supervised Pre-training for Time Series Classification
Shi, Pengxiang
Ye, Wenwen
Qin, Zheng
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[29] LBS Autoencoder: Self-supervised Fitting of Articulated Meshes to Point Clouds
Li, Chun-Liang
Simon, Tomas
Saragih, Jason
Poczos, Barnabas
Sheikh, Yaser
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11959 - 11968
[30] Object Adaptive Self-Supervised Dense Visual Pre-Training
Zhang, Yu
Zhang, Tao
Zhu, Hongyuan
Chen, Zihan
Mi, Siya
Peng, Xi
Geng, Xin
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 2228 - 2240

← 1 2 3 4 5 →