Masked Autoencoders in 3D Point Cloud Representation Learning

被引:4
|
作者
Jiang, Jincen [1 ]
Lu, Xuequan [2 ]
Zhao, Lizhi [1 ]
Dazeley, Richard [3 ]
Wang, Meili [1 ]
机构
[1] NorthWest A&F Univ, Coll Informat Engn, Yangling 712100, Peoples R China
[2] La Trobe Univ, Dept Comp Sci & IT, Melbourne, Vic 3000, Australia
[3] Deakin Univ, Sch Informat Technol, Geelong, Vic 3216, Australia
关键词
Point cloud compression; Transformers; Task analysis; Feature extraction; Three-dimensional displays; Solid modeling; Decoding; Self-supervised learning; point cloud; completion; NETWORK;
D O I
10.1109/TMM.2023.3314973
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer-based Self-supervised Representation Learning methods learn generic features from unlabeled datasets for providing useful network initialization parameters for downstream tasks. Recently, methods based upon masking Autoencoders have been explored in the fields. The input can be intuitively masked due to regular content, like sequence words and 2D pixels. However, the extension to 3D point cloud is challenging due to irregularity. In this article, we propose masked Autoencoders in 3D point cloud representation learning (abbreviated as MAE3D), a novel autoencoding paradigm for self-supervised learning. We first split the input point cloud into patches and mask a portion of them, then use our Patch Embedding Module to extract the features of unmasked patches. Secondly, we employ patch-wise MAE3D Transformers to learn both local features of point cloud patches and high-level contextual relationships between patches, then complete the latent representations of masked patches. We use our Point Cloud Reconstruction Module with multi-task loss to complete the incomplete point cloud as a result. We conduct self-supervised pre-training on ShapeNet55 with the point cloud completion pre-text task and fine-tune the pre-trained model on ModelNet40 and ScanObjectNN (PB_T50_RS, the hardest variant). Comprehensive experiments demonstrate that the local features extracted by our MAE3D from point cloud patches are beneficial for downstream classification tasks, soundly outperforming state-of-the-art methods (93.4% and 86.2% classification accuracy, respectively).
引用
收藏
页码:820 / 831
页数:12
相关论文
共 50 条
  • [21] REPRESENTATION LEARNING OPTIMIZATION FOR 3D POINT CLOUD QUALITY ASSESSMENT WITHOUT REFERENCE
    Tliba, Marouane
    Chetouani, Aladine
    Valenzise, Giuseppe
    Dufaux, Frederic
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3702 - 3706
  • [22] GridNet: efficiently learning deep hierarchical representation for 3D point cloud understanding
    Wang, Huiqun
    Huang, Di
    Wang, Yunhong
    FRONTIERS OF COMPUTER SCIENCE, 2022, 16 (01)
  • [23] Representation Learning via Parallel Subset Reconstruction for 3D Point Cloud Generation
    Matsuzaki, Kohei
    Tasaka, Kazuyuki
    2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 289 - 296
  • [24] Quadratic Terms Based Point-to-Surface 3D Representation for Deep Learning of Point Cloud
    Sun, Tiecheng
    Liu, Guanghui
    Li, Ru
    Liu, Shuaicheng
    Zhu, Shuyuan
    Zeng, Bing
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 2705 - 2718
  • [25] Enhancing Representation Learning of EEG Data with Masked Autoencoders
    Zhou, Yifei
    Liu, Sitong
    AUGMENTED COGNITION, PT II, AC 2024, 2024, 14695 : 88 - 100
  • [26] Bioinspired point cloud representation: 3D object tracking
    Sergio Orts-Escolano
    Jose Garcia-Rodriguez
    Miguel Cazorla
    Vicente Morell
    Jorge Azorin
    Marcelo Saval
    Alberto Garcia-Garcia
    Victor Villena
    Neural Computing and Applications, 2018, 29 : 663 - 672
  • [27] Bioinspired point cloud representation: 3D object tracking
    Orts-Escolano, Sergio
    Garcia-Rodriguez, Jose
    Cazorla, Miguel
    Morell, Vicente
    Azorin, Jorge
    Saval, Marcelo
    Garcia-Garcia, Alberto
    Villena, Victor
    NEURAL COMPUTING & APPLICATIONS, 2018, 29 (09): : 663 - 672
  • [28] MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis
    Liang, Yaqian
    Zhao, Shanshan
    Yu, Baosheng
    Zhang, Jing
    He, Fazhi
    COMPUTER VISION - ECCV 2022, PT III, 2022, 13663 : 37 - 54
  • [29] Rethinking Masked-Autoencoder-Based 3D Point Cloud Pretraining
    Cheng, Nuo
    Luo, Chuanyu
    Li, Xinzhe
    Hu, Ruizhi
    Li, Han
    Ma, Sikun
    Ren, Zhong
    Jiang, Haipeng
    Li, Xiaohan
    Lei, Shengguang
    Li, Pu
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 2763 - 2768
  • [30] Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders
    Zhang, Renrui
    Wang, Liuhui
    Qiao, Yu
    Gao, Peng
    Li, Hongsheng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21769 - 21780