Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos

被引:2
|
作者
Shen, Zhiqiang [1 ,5 ]
Sheng, Xiaoxiao [1 ]
Fan, Hehe [2 ]
Wang, Longguang [3 ]
Guo, Yulan [4 ]
Liu, Qiong [5 ]
Wen, Hao [5 ]
Zhou, Xi [1 ,5 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Zhejiang Univ, Hangzhou, Peoples R China
[3] Aviat Univ Air Force, Hangzhou, Peoples R China
[4] Sun Yat Sen Univ, Guangzhou, Peoples R China
[5] CloudWalk, Chongqing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV51070.2023.01520
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, the community has made tremendous progress in developing effective methods for point cloud video understanding that learn from massive amounts of labeled data. However, annotating point cloud videos is usually notoriously expensive. Moreover, training via one or only a few traditional tasks (e.g., classification) may be insufficient to learn subtle details of the spatio-temporal structure existing in point cloud videos. In this paper, we propose a Masked Spatio-Temporal Structure Prediction (MaST-Pre) method to capture the structure of point cloud videos without human annotations. MaST-Pre is based on spatio-temporal point-tube masking and consists of two self- supervised learning tasks. First, by reconstructing masked point tubes, our method is able to capture the appearance information of point cloud videos. Second, to learn motion, we propose a temporal cardinality difference prediction task that estimates the change in the number of points within a point tube. In this way, MaST-Pre is forced to model the spatial and temporal structure in point cloud videos. Extensive experiments on MSRAction-3D, NTU-RGBD, NvGesture, and SHREC'17 demonstrate the effectiveness of the proposed method. The code is available at https://github. com/JohnsonSign/MaST-Pre.
引用
收藏
页码:16534 / 16543
页数:10
相关论文
共 50 条
  • [1] Spatio-Temporal Self-Supervised Learning for Traffic Flow Prediction
    Ji, Jiahao
    Wang, Jingyuan
    Huang, Chao
    Wu, Junjie
    Xu, Boren
    Wu, Zhenhe
    Zhang, Junbo
    Zheng, Yu
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4, 2023, : 4356 - 4364
  • [2] CONTRASTIVE SELF-SUPERVISED LEARNING FOR SPATIO-TEMPORAL ANALYSIS OF LUNG ULTRASOUND VIDEOS
    Chen, Li
    Rubin, Jonathan
    Ouyang, Jiahong
    Balaraju, Naveen
    Patil, Shubham
    Mehanian, Courosh
    Kulhare, Sourabh
    Millin, Rachel
    Gregory, Kenton W.
    Gregory, Cynthia R.
    Zhu, Meihua
    Kessler, David O.
    Malia, Laurie
    Dessie, Almaz
    Rabiner, Joni
    Coneybeare, Di
    Shopsin, Bo
    Hersh, Andrew
    Madar, Cristian
    Shupp, Jeffrey
    Johnson, Laura S.
    Avila, Jacob
    Dwyer, Kristin
    Weimersheimer, Peter
    Raju, Balasundar
    Kruecker, Jochen
    Chen, Alvin
    [J]. 2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [3] Masked Autoencoders for Point Cloud Self-supervised Learning
    Pang, Yatian
    Wang, Wenxiao
    Tay, Francis E. H.
    Liu, Wei
    Tian, Yonghong
    Yuan, Li
    [J]. COMPUTER VISION - ECCV 2022, PT II, 2022, 13662 : 604 - 621
  • [4] Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos
    Sheng, Xiaoxiao
    Shen, Zhiqiang
    Xiao, Gang
    Wang, Longguang
    Guo, Yulan
    Fan, Hehe
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16469 - 16478
  • [5] Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics
    Wang, Jiangliu
    Jiao, Jianbo
    Bao, Linchao
    He, Shengfeng
    Liu, Yunhui
    Liu, Wei
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4001 - 4010
  • [6] PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos
    Shen, Zhiqiang
    Sheng, Xiaoxiao
    Wang, Longguang
    Guo, Yulan
    Liu, Qiong
    Zhou, Xi
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1212 - 1222
  • [7] Self-supervised 4D Spatio-temporal Feature Learning via Order Prediction of Sequential Point Cloud Clips
    Wang, Haiyan
    Yang, Liang
    Rong, Xuejian
    Feng, Jinglun
    Tian, Yingli
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3761 - 3770
  • [8] Self-Supervised Spatio-Temporal Graph Learning for Point-of-Interest Recommendation
    Liu, Jiawei
    Gao, Haihan
    Shi, Chuan
    Cheng, Hongtao
    Xie, Qianlong
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (15):
  • [9] Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds
    Huang, Siyuan
    Degrees, Yichen Xie
    Zhu, Song-Chun
    Zhu, Yixin
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6515 - 6525
  • [10] Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning
    Luo, Dezhao
    Liu, Chang
    Zhou, Yu
    Yang, Dongbao
    Ma, Can
    Ye, Qixiang
    Wang, Weiping
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11701 - 11708