T-MAE : Temporal Masked Autoencoders for Point Cloud Representation Learning

被引:0
|
作者
Wei, Weijie [1 ]
Nejadasl, Fatemeh Karimi [1 ]
Gevers, Theo [1 ]
Oswald, Martin R. [1 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
来源
关键词
Self-supervised learning; LiDAR point cloud; 3D detection; NETWORKS;
D O I
10.1007/978-3-031-73247-8_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The scarcity of annotated data in LiDAR point cloud understanding hinders effective representation learning. Consequently, scholars have been actively investigating efficacious self-supervised pre-training paradigms. Nevertheless, temporal information, which is inherent in the LiDAR point cloud sequence, is consistently disregarded. To better utilize this property, we propose an effective pre-training strategy, namely Temporal Masked Auto-Encoders (T-MAE), which takes as input temporally adjacent frames and learns temporal dependency. A SiamWCA backbone, containing a Siamese encoder and a windowed cross-attention (WCA) module, is established for the two-frame input. Considering that the movement of an ego-vehicle alters the view of the same instance, temporal modeling also serves as a robust and natural data augmentation, enhancing the comprehension of target objects. SiamWCA is a powerful architecture but heavily relies on annotated data. Our T-MAE pre-training strategy alleviates its demand for annotated data. Comprehensive experiments demonstrate that T-MAE achieves the best performance on both Waymo and ONCE datasets among competitive self-supervised approaches.
引用
收藏
页码:178 / 195
页数:18
相关论文
共 50 条
  • [1] Masked Autoencoders in 3D Point Cloud Representation Learning
    Jiang, Jincen
    Lu, Xuequan
    Zhao, Lizhi
    Dazeley, Richard
    Wang, Meili
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 820 - 831
  • [2] Masked Autoencoders for Point Cloud Self-supervised Learning
    Pang, Yatian
    Wang, Wenxiao
    Tay, Francis E. H.
    Liu, Wei
    Tian, Yonghong
    Yuan, Li
    COMPUTER VISION - ECCV 2022, PT II, 2022, 13662 : 604 - 621
  • [3] ViC-MAE: Self-supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
    Hernandez, Jefferson
    Villegas, Ruben
    Ordonez, Vicente
    COMPUTER VISION-ECCV 2024, PT IV, 2025, 15062 : 444 - 463
  • [4] DiffPMAE: Diffusion Masked Autoencoders for Point Cloud Reconstruction
    Li, Yanlong
    Madarasingha, Chamara
    Thilakarathna, Kanchana
    COMPUTER VISION-ECCV 2024, PT XLVI, 2025, 15104 : 362 - 380
  • [5] Enhancing Representation Learning of EEG Data with Masked Autoencoders
    Zhou, Yifei
    Liu, Sitong
    AUGMENTED COGNITION, PT II, AC 2024, 2024, 14695 : 88 - 100
  • [6] PatchMixing Masked Autoencoders for 3D Point Cloud Self-Supervised Learning
    Lin, Chengxing
    Xu, Wenju
    Zhu, Jian
    Nie, Yongwei
    Cai, Ruichu
    Xu, Xuemiao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9882 - 9897
  • [7] NeRF-MAE: Masked AutoEncoders for Self-supervised 3D Representation Learning for Neural Radiance Fields
    Irshad, Muhammad Zubair
    Zakharov, Sergey
    Guizilini, Vitor
    Gaidon, Adrien
    Kira, Zsolt
    Ambrus, Rares
    COMPUTER VISION - ECCV 2024, PT LXXXVIII, 2025, 15146 : 434 - 453
  • [8] BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios
    Lin, Zhiwei
    Wang, Yongtao
    Qi, Shengxiang
    Dong, Nan
    Yang, Ming-Hsuan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3531 - 3539
  • [9] GMAE: Representation Learning on Graph via Masked Graph Autoencoders
    Zheng, Chengbin
    Yang, Zhicheng
    Lu, Yang
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 2515 - 2521
  • [10] Masked Structural Point Cloud Modeling to Learning 3D Representation
    Yamada, Ryosuke
    Tadokoro, Ryu
    Qiu, Yue
    Kataoka, Hirokatsu
    Satoh, Yutaka
    IEEE ACCESS, 2024, 12 : 142291 - 142305