Masked Autoencoder for Self-Supervised Pre-training on Lidar Point Clouds

被引：16

作者：

Hess, Georg ^{[1
,2
]}

Jaxing, Johan ^{[1
]}

Svensson, Elias ^{[1
]}

Hagerman, David ^{[1
]}

Petersson, Christoffer ^{[1
,2
]}

Svensson, Lennart ^{[1
]}

机构：

[1] Chalmers Univ Technol, Gothenburg, Sweden

[2] Zenseact, Gothenburg, Sweden

来源：

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW) | 2023年

基金：

瑞典研究理事会;

关键词：

D O I：

10.1109/WACVW58289.2023.00039

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Masked autoencoding has become a successful pretraining paradigm for Transformer models for text, images, and, recently, point clouds. Raw automotive datasets are suitable candidates for self-supervised pre-training as they generally are cheap to collect compared to annotations for tasks like 3D object detection (OD). However, the development of masked autoencoders for point clouds has focused solely on synthetic and indoor data. Consequently, existing methods have tailored their representations and models toward small and dense point clouds with homogeneous point densities. In this work, we study masked autoencoding for point clouds in an automotive setting, which are sparse and for which the point density can vary drastically among objects in the same scene. To this end, we propose VoxelMAE, a simple masked autoencoding pre-training scheme designed for voxel representations. We pre-train the backbone of a Transformer-based 3D object detector to reconstruct masked voxels and to distinguish between empty and non-empty voxels. Our method improves the 3D OD performance by 1.75 mAP points and 1.05 NDS on the challenging nuScenes dataset. Further, we show that by pre-training with Voxel-MAE, we require only 40% of the annotated data to outperform a randomly initialized equivalent. Code is available at https://github.com/georghess/ voxel-mae.

引用

页码：350 / 359

页数：10

共 50 条

[1] Masked Feature Prediction for Self-Supervised Visual Pre-Training
Wei, Chen
Fan, Haoqi
Xie, Saining
Wu, Chao-Yuan
Yuille, Alan
Feichtenhofer, Christoph
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14648 - 14658
[2] GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training
Tian, Xiaoyu
Ran, Haoxi
Wang, Yue
Zhao, Hang
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13570 - 13580
[3] Inter-Modal Masked Autoencoder for Self-Supervised Learning on Point Clouds
Liu, Jiaming
Wu, Yue
Gong, Maoguo
Liu, Zhixiao
Miao, Qiguang
Ma, Wenping
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3897 - 3908
[4] GO-MAE: Self-supervised pre-training via masked autoencoder for OCT image classification of gynecology
Wang, Haoran
Guo, Xinyu
Song, Kaiwen
Sun, Mingyang
Shao, Yanbin
Xue, Songfeng
Zhang, Hongwei
Zhang, Tianyu
NEURAL NETWORKS, 2025, 181
[5] Self-supervised ECG pre-training
Liu, Han
Zhao, Zhenbo
She, Qiang
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 70
[6] MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
Xu, Runsen
Wang, Tai
Zhang, Wenwei
Chen, Runjian
Cao, Jinkun
Pang, Jiangmiao
Lin, Dahua
arXiv, 2023,
[7] The Devil Is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training
Liu, Hao
Jiang, Xinghua
Li, Xin
Guo, Antai
Hu, Yiqing
Jiang, Deqiang
Ren, Bo
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1649 - 1656
[8] Masked Deformation Modeling for Volumetric Brain MRI Self-Supervised Pre-Training
Lyu, Junyan
Bartlett, Perry F.
Nasrallah, Fatima A.
Tang, Xiaoying
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2025, 44 (03) : 1596 - 1607
[9] MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
Xu, Runsen
Wang, Tai
Zhang, Wenwei
Chen, Runjian
Cao, Jinkun
Pang, Jiangmiao
Lin, Dahua
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13445 - 13454
[10] MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
Xu, Runsen
Wang, Tai
Zhang, Wenwei
Chen, Runjian
Cao, Jinkun
Pang, Jiangmiao
Lin, Dahua
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023, 2023-June : 13445 - 13454

← 1 2 3 4 5 →