T-MAE : Temporal Masked Autoencoders for Point Cloud Representation Learning

被引:0
|
作者
Wei, Weijie [1 ]
Nejadasl, Fatemeh Karimi [1 ]
Gevers, Theo [1 ]
Oswald, Martin R. [1 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
来源
关键词
Self-supervised learning; LiDAR point cloud; 3D detection; NETWORKS;
D O I
10.1007/978-3-031-73247-8_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The scarcity of annotated data in LiDAR point cloud understanding hinders effective representation learning. Consequently, scholars have been actively investigating efficacious self-supervised pre-training paradigms. Nevertheless, temporal information, which is inherent in the LiDAR point cloud sequence, is consistently disregarded. To better utilize this property, we propose an effective pre-training strategy, namely Temporal Masked Auto-Encoders (T-MAE), which takes as input temporally adjacent frames and learns temporal dependency. A SiamWCA backbone, containing a Siamese encoder and a windowed cross-attention (WCA) module, is established for the two-frame input. Considering that the movement of an ego-vehicle alters the view of the same instance, temporal modeling also serves as a robust and natural data augmentation, enhancing the comprehension of target objects. SiamWCA is a powerful architecture but heavily relies on annotated data. Our T-MAE pre-training strategy alleviates its demand for annotated data. Comprehensive experiments demonstrate that T-MAE achieves the best performance on both Waymo and ONCE datasets among competitive self-supervised approaches.
引用
收藏
页码:178 / 195
页数:18
相关论文
共 50 条
  • [41] Enhanced Local and Global Learning for Rotation-Invariant Point Cloud Representation
    Gu, Ruibin
    Wu, Qiuxia
    Li, Yuqiong
    Kang, Wenxiong
    Ng, Wing W. Y.
    Wang, Zhiyong
    IEEE MULTIMEDIA, 2022, 29 (04) : 24 - 37
  • [42] DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation
    Ye, Maosheng
    Xu, Shuangjie
    Cao, Tongyi
    Chen, Qifeng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7427 - 7436
  • [43] ASSANet: An Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning
    Qian, Guocheng
    Hammond, Hasan Abed Al Kader
    Li, Guohao
    Thabet, Ali
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [44] Feature extraction and representation learning of 3D point cloud data
    Si, Hongying
    Wei, Xianyong
    IMAGE AND VISION COMPUTING, 2024, 142
  • [45] LEARNING SPATIAL-TEMPORAL EMBEDDINGS FOR SEQUENTIAL POINT CLOUD FRAME INTERPOLATION
    Zhao, Lili
    Sun, Zhuoqun
    Ren, Lancao
    Yin, Qian
    Yang, Lei
    Guo, Meng
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 810 - 814
  • [46] Dynamic Point Cloud Inpainting via Spatial-Temporal Graph Learning
    Fu, Zeqing
    Hu, Wei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 3022 - 3034
  • [47] Generative Variational-Contrastive Learning for Self-Supervised Point Cloud Representation
    Wang, Bohua
    Tian, Zhiqiang
    Ye, Aixue
    Wen, Feng
    Du, Shaoyi
    Gao, Yue
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (09) : 6154 - 6166
  • [48] Self-Supervised Point Cloud Representation Learning via Separating Mixed Shapes
    Sun, Chao
    Zheng, Zhedong
    Wang, Xiaohan
    Xu, Mingliang
    Yang, Yi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6207 - 6218
  • [49] Quadratic Terms Based Point-to-Surface 3D Representation for Deep Learning of Point Cloud
    Sun, Tiecheng
    Liu, Guanghui
    Li, Ru
    Liu, Shuaicheng
    Zhu, Shuyuan
    Zeng, Bing
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 2705 - 2718
  • [50] Dynamic Representation Learning with Temporal Point Processes for Higher-Order Interaction Forecasting
    Gracious, Tony
    Dukkipati, Ambedkar
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7748 - 7756