Masked Autoencoders in 3D Point Cloud Representation Learning

被引:4
|
作者
Jiang, Jincen [1 ]
Lu, Xuequan [2 ]
Zhao, Lizhi [1 ]
Dazeley, Richard [3 ]
Wang, Meili [1 ]
机构
[1] NorthWest A&F Univ, Coll Informat Engn, Yangling 712100, Peoples R China
[2] La Trobe Univ, Dept Comp Sci & IT, Melbourne, Vic 3000, Australia
[3] Deakin Univ, Sch Informat Technol, Geelong, Vic 3216, Australia
关键词
Point cloud compression; Transformers; Task analysis; Feature extraction; Three-dimensional displays; Solid modeling; Decoding; Self-supervised learning; point cloud; completion; NETWORK;
D O I
10.1109/TMM.2023.3314973
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer-based Self-supervised Representation Learning methods learn generic features from unlabeled datasets for providing useful network initialization parameters for downstream tasks. Recently, methods based upon masking Autoencoders have been explored in the fields. The input can be intuitively masked due to regular content, like sequence words and 2D pixels. However, the extension to 3D point cloud is challenging due to irregularity. In this article, we propose masked Autoencoders in 3D point cloud representation learning (abbreviated as MAE3D), a novel autoencoding paradigm for self-supervised learning. We first split the input point cloud into patches and mask a portion of them, then use our Patch Embedding Module to extract the features of unmasked patches. Secondly, we employ patch-wise MAE3D Transformers to learn both local features of point cloud patches and high-level contextual relationships between patches, then complete the latent representations of masked patches. We use our Point Cloud Reconstruction Module with multi-task loss to complete the incomplete point cloud as a result. We conduct self-supervised pre-training on ShapeNet55 with the point cloud completion pre-text task and fine-tune the pre-trained model on ModelNet40 and ScanObjectNN (PB_T50_RS, the hardest variant). Comprehensive experiments demonstrate that the local features extracted by our MAE3D from point cloud patches are beneficial for downstream classification tasks, soundly outperforming state-of-the-art methods (93.4% and 86.2% classification accuracy, respectively).
引用
收藏
页码:820 / 831
页数:12
相关论文
共 50 条
  • [31] Learning Geometry-Disentangled Representation for Complementary Understanding of 3D Object Point Cloud
    Xu, Mutian
    Zhang, Junhao
    Zhou, Zhipeng
    Xu, Mingye
    Qi, Xiaojuan
    Qiao, Yu
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3056 - 3064
  • [32] Unsupervised 3D Point Cloud Representation Learning by Triangle Constrained Contrast for Autonomous Driving
    Pang, Bo
    Xia, Hongchi
    Lu, Cewu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5229 - 5239
  • [33] Rotation-Invariant Local-to-Global Representation Learning for 3D Point Cloud
    Kim, Seohyun
    Park, Jaeyoo
    Han, Bohyung
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [34] UMA-Net: an unsupervised representation learning network for 3D point cloud classification
    Liu, Jie
    Tian, Yu
    Geng, Guohua
    Wang, Haolin
    Song, Da
    Li, Kang
    Zhou, Mingquan
    Cao, Xin
    JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 2022, 39 (06) : 1085 - 1094
  • [35] Learning Interpretable Representation for 3D Point Clouds
    Su, Feng-Guang
    Lin, Ci-Siang
    Wang, Yu-Chiang Frank
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7470 - 7477
  • [36] Learning from 3D (Point Cloud) Data
    Hsu, Winston H.
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2697 - 2698
  • [37] Learning multiview 3D point cloud registration
    Gojcic, Zan
    Zhou, Caifa
    Wegner, Jan D.
    Guibas, Leonidas J.
    Birdal, Tolga
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1756 - 1766
  • [38] UniM2 AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving
    Zou, Jian
    Huang, Tianyu
    Yang, Guanglei
    Guo, Zhenhua
    Luo, Tao
    Feng, Chun-Mei
    Zuo, Wangmeng
    COMPUTER VISION-ECCV 2024, PT XXII, 2025, 15080 : 296 - 313
  • [39] Learning Progressive Point Embeddings for 3D Point Cloud Generation
    Wen, Cheng
    Yu, Baosheng
    Tao, Dacheng
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 10261 - 10270
  • [40] Interpreting Representation Quality of DNNs for 3D Point Cloud Processing
    Shen, Wen
    Ren, Qihan
    Liu, Dongrui
    Zhang, Quanshi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34