Masked Autoencoders in 3D Point Cloud Representation Learning

被引：4

作者：

Jiang, Jincen ^{[1
]}

Lu, Xuequan ^{[2
]}

Zhao, Lizhi ^{[1
]}

Dazeley, Richard ^{[3
]}

Wang, Meili ^{[1
]}

机构：

[1] NorthWest A&F Univ, Coll Informat Engn, Yangling 712100, Peoples R China

[2] La Trobe Univ, Dept Comp Sci & IT, Melbourne, Vic 3000, Australia

[3] Deakin Univ, Sch Informat Technol, Geelong, Vic 3216, Australia

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2025年 / 27卷

关键词：

Point cloud compression; Transformers; Task analysis; Feature extraction; Three-dimensional displays; Solid modeling; Decoding; Self-supervised learning; point cloud; completion; NETWORK;

D O I：

10.1109/TMM.2023.3314973

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Transformer-based Self-supervised Representation Learning methods learn generic features from unlabeled datasets for providing useful network initialization parameters for downstream tasks. Recently, methods based upon masking Autoencoders have been explored in the fields. The input can be intuitively masked due to regular content, like sequence words and 2D pixels. However, the extension to 3D point cloud is challenging due to irregularity. In this article, we propose masked Autoencoders in 3D point cloud representation learning (abbreviated as MAE3D), a novel autoencoding paradigm for self-supervised learning. We first split the input point cloud into patches and mask a portion of them, then use our Patch Embedding Module to extract the features of unmasked patches. Secondly, we employ patch-wise MAE3D Transformers to learn both local features of point cloud patches and high-level contextual relationships between patches, then complete the latent representations of masked patches. We use our Point Cloud Reconstruction Module with multi-task loss to complete the incomplete point cloud as a result. We conduct self-supervised pre-training on ShapeNet55 with the point cloud completion pre-text task and fine-tune the pre-trained model on ModelNet40 and ScanObjectNN (PB_T50_RS, the hardest variant). Comprehensive experiments demonstrate that the local features extracted by our MAE3D from point cloud patches are beneficial for downstream classification tasks, soundly outperforming state-of-the-art methods (93.4% and 86.2% classification accuracy, respectively).

引用

页码：820 / 831

页数：12

共 50 条

[1] Rethinking Masked Representation Learning for 3D Point Cloud Understanding
Wang, Chuxin
Zha, Yixin
He, Jianfeng
Yang, Wenfei
Zhang, Tianzhu
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 247 - 262
[2] Masked Structural Point Cloud Modeling to Learning 3D Representation
Yamada, Ryosuke
Tadokoro, Ryu
Qiu, Yue
Kataoka, Hirokatsu
Satoh, Yutaka
IEEE ACCESS, 2024, 12 : 142291 - 142305
[3] PatchMixing Masked Autoencoders for 3D Point Cloud Self-Supervised Learning
Lin, Chengxing
Xu, Wenju
Zhu, Jian
Nie, Yongwei
Cai, Ruichu
Xu, Xuemiao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9882 - 9897
[4] T-MAE : Temporal Masked Autoencoders for Point Cloud Representation Learning
Wei, Weijie
Nejadasl, Fatemeh Karimi
Gevers, Theo
Oswald, Martin R.
COMPUTER VISION - ECCV 2024, PT XI, 2025, 15069 : 178 - 195
[5] PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection
Chen, Anthony
Zhang, Kevin
Zhang, Renrui
Wang, Zihan
Lu, Yuheng
Guo, Yandong
Zhang, Shanghang
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5291 - 5301
[6] Feature Visualization for 3D Point Cloud Autoencoders
Rios, Thiago
van Stein, Bas
Menzel, Stefan
Baeck, Thomas
Sendhoff, Bernhard
Wollstadt, Patricia
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[7] Joint representation learning for text and 3D point cloud
Huang, Rui
Pan, Xuran
Zheng, Henry
Jiang, Haojun
Xie, Zhifeng
Wu, Cheng
Song, Shiji
Huang, Gao
PATTERN RECOGNITION, 2024, 147
[8] Geometric Invariant Representation Learning for 3D Point Cloud
Li, Zongmin
Zhang, Yupeng
Bai, Yun
2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 1480 - 1485
[9] Masked Autoencoders for Point Cloud Self-supervised Learning
Pang, Yatian
Wang, Wenxiao
Tay, Francis E. H.
Liu, Wei
Tian, Yonghong
Yuan, Li
COMPUTER VISION - ECCV 2022, PT II, 2022, 13662 : 604 - 621
[10] Scalability of Learning Tasks on 3D CAE Models Using Point Cloud Autoencoders
Rios, Thiago
Wollstadt, Patricia
van Stein, Bas
Baeck, Thomas
Xu, Zhao
Sendhoff, Bernhard
Menzel, Stefan
2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 1367 - 1374

← 1 2 3 4 5 →