Masked Autoencoders in 3D Point Cloud Representation Learning

被引:4
|
作者
Jiang, Jincen [1 ]
Lu, Xuequan [2 ]
Zhao, Lizhi [1 ]
Dazeley, Richard [3 ]
Wang, Meili [1 ]
机构
[1] NorthWest A&F Univ, Coll Informat Engn, Yangling 712100, Peoples R China
[2] La Trobe Univ, Dept Comp Sci & IT, Melbourne, Vic 3000, Australia
[3] Deakin Univ, Sch Informat Technol, Geelong, Vic 3216, Australia
关键词
Point cloud compression; Transformers; Task analysis; Feature extraction; Three-dimensional displays; Solid modeling; Decoding; Self-supervised learning; point cloud; completion; NETWORK;
D O I
10.1109/TMM.2023.3314973
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer-based Self-supervised Representation Learning methods learn generic features from unlabeled datasets for providing useful network initialization parameters for downstream tasks. Recently, methods based upon masking Autoencoders have been explored in the fields. The input can be intuitively masked due to regular content, like sequence words and 2D pixels. However, the extension to 3D point cloud is challenging due to irregularity. In this article, we propose masked Autoencoders in 3D point cloud representation learning (abbreviated as MAE3D), a novel autoencoding paradigm for self-supervised learning. We first split the input point cloud into patches and mask a portion of them, then use our Patch Embedding Module to extract the features of unmasked patches. Secondly, we employ patch-wise MAE3D Transformers to learn both local features of point cloud patches and high-level contextual relationships between patches, then complete the latent representations of masked patches. We use our Point Cloud Reconstruction Module with multi-task loss to complete the incomplete point cloud as a result. We conduct self-supervised pre-training on ShapeNet55 with the point cloud completion pre-text task and fine-tune the pre-trained model on ModelNet40 and ScanObjectNN (PB_T50_RS, the hardest variant). Comprehensive experiments demonstrate that the local features extracted by our MAE3D from point cloud patches are beneficial for downstream classification tasks, soundly outperforming state-of-the-art methods (93.4% and 86.2% classification accuracy, respectively).
引用
收藏
页码:820 / 831
页数:12
相关论文
共 50 条
  • [41] 3D Point Cloud Registration based on the Vector Field Representation
    Van Tung Nguyen
    Trung-Thien Tran
    Van-Toan Cao
    Laurendeau, Denis
    2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 491 - 495
  • [42] Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
    Yu, Xumin
    Tang, Lulu
    Rao, Yongming
    Huang, Tiejun
    Zhou, Jie
    Lu, Jiwen
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19291 - 19300
  • [43] Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning
    Wu, Xiaoyang
    Wen, Xin
    Liu, Xihui
    Zhao, Hengshuang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9415 - 9424
  • [44] Learning 3D Face Representation with Vision Transformer for Masked Face Recognition
    Wang, Yuan
    Yang, Zhen
    Zhang, Zhiqiang
    Zang, Huaijuan
    Zhu, Qiang
    Zhan, Shu
    2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML 2022), 2022, : 505 - 511
  • [45] GMAE: Representation Learning on Graph via Masked Graph Autoencoders
    Zheng, Chengbin
    Yang, Zhicheng
    Lu, Yang
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 2515 - 2521
  • [46] Scene Graph Masked Variational Autoencoders for 3D Scene Generation
    Xu, Rui
    Hui, Le
    Han, Yuehui
    Qian, Jianjun
    Xie, Jin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5725 - 5733
  • [47] Privacy Protection in MRI Scans Using 3D Masked Autoencoders
    Van der Goten, Lennart A.
    Smith, Kevin
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VII, 2024, 15007 : 583 - 592
  • [48] Point Cloud Domain Adaptation via Masked Local 3D Structure Prediction
    Liang, Hanxue
    Fan, Hehe
    Fan, Zhiwen
    Wang, Yi
    Chen, Tianlong
    Cheng, Yu
    Wang, Zhangyang
    COMPUTER VISION - ECCV 2022, PT III, 2022, 13663 : 156 - 172
  • [49] Masked Autoencoder for Pre-Training on 3D Point Cloud Object Detection
    Xie, Guangda
    Li, Yang
    Qu, Hongquan
    Sun, Zaiming
    MATHEMATICS, 2022, 10 (19)
  • [50] DCCN: A dual-cross contrastive neural network for 3D point cloud representation learning
    Wu, Xiaopeng
    Shi, Guangsi
    Zhao, Zexing
    Li, Mingjie
    Gao, Xiaojun
    Yan, Xiaoli
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249