Masked Autoencoder for Self-Supervised Pre-training on Lidar Point Clouds

被引：16

作者：

Hess, Georg ^{[1
,2
]}

Jaxing, Johan ^{[1
]}

Svensson, Elias ^{[1
]}

Hagerman, David ^{[1
]}

Petersson, Christoffer ^{[1
,2
]}

Svensson, Lennart ^{[1
]}

机构：

[1] Chalmers Univ Technol, Gothenburg, Sweden

[2] Zenseact, Gothenburg, Sweden

来源：

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW) | 2023年

基金：

瑞典研究理事会;

关键词：

D O I：

10.1109/WACVW58289.2023.00039

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Masked autoencoding has become a successful pretraining paradigm for Transformer models for text, images, and, recently, point clouds. Raw automotive datasets are suitable candidates for self-supervised pre-training as they generally are cheap to collect compared to annotations for tasks like 3D object detection (OD). However, the development of masked autoencoders for point clouds has focused solely on synthetic and indoor data. Consequently, existing methods have tailored their representations and models toward small and dense point clouds with homogeneous point densities. In this work, we study masked autoencoding for point clouds in an automotive setting, which are sparse and for which the point density can vary drastically among objects in the same scene. To this end, we propose VoxelMAE, a simple masked autoencoding pre-training scheme designed for voxel representations. We pre-train the backbone of a Transformer-based 3D object detector to reconstruct masked voxels and to distinguish between empty and non-empty voxels. Our method improves the 3D OD performance by 1.75 mAP points and 1.05 NDS on the challenging nuScenes dataset. Further, we show that by pre-training with Voxel-MAE, we require only 40% of the annotated data to outperform a randomly initialized equivalent. Code is available at https://github.com/georghess/ voxel-mae.

引用

页码：350 / 359

页数：10

共 50 条

[31] UniVIP: A Unified Framework for Self-Supervised Visual Pre-training
Li, Zhaowen
Zhu, Yousong
Yang, Fan
Li, Wei
Zhao, Chaoyang
Chen, Yingying
Chen, Zhiyang
Xie, Jiahao
Wu, Liwei
Zhao, Rui
Tang, Ming
Wang, Jinqiao
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14607 - 14616
[32] Representation Recovering for Self-Supervised Pre-training on Medical Images
Yan, Xiangyi
Naushad, Junayed
Sun, Shanlin
Han, Kun
Tang, Hao
Kong, Deying
Ma, Haoyu
You, Chenyu
Xie, Xiaohui
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2684 - 2694
[33] Reducing Domain mismatch in Self-supervised speech pre-training
Baskar, Murali Karthick
Rosenberg, Andrew
Ramabhadran, Bhuvana
Zhang, Yu
INTERSPEECH 2022, 2022, : 3028 - 3032
[34] Dense Contrastive Learning for Self-Supervised Visual Pre-Training
Wang, Xinlong
Zhang, Rufeng
Shen, Chunhua
Kong, Tao
Li, Lei
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3023 - 3032
[35] A Self-Supervised Pre-Training Method for Chinese Spelling Correction
Su J.
Yu S.
Hong X.
Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2023, 51 (09): : 90 - 98
[36] Self-supervised VICReg pre-training for Brugada ECG detection
Ronan, Robert
Tarabanis, Constantine
Chinitz, Larry
Jankelson, Lior
SCIENTIFIC REPORTS, 2025, 15 (01):
[37] Self-supervised Pre-training for Semantic Segmentation in an Indoor Scene
Shrestha, Sulabh
Li, Yimeng
Kosecka, Jana
2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 625 - 635
[38] Self-supervised pre-training on industrial time-series
Biggio, Luca
Kastanis, Iason
2021 8TH SWISS CONFERENCE ON DATA SCIENCE, SDS, 2021, : 56 - 57
[39] DiT: Self-supervised Pre-training for Document Image Transformer
Li, Junlong
Xu, Yiheng
Lv, Tengchao
Cui, Lei
Zhang, Cha
Wei, Furu
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3530 - 3539
[40] CDS: Cross-Domain Self-supervised Pre-training
Kim, Donghyun
Saito, Kuniaki
Oh, Tae-Hyun
Plummer, Bryan A.
Sclaroff, Stan
Saenko, Kate
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9103 - 9112

← 1 2 3 4 5 →