Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction

被引:12
|
作者
Xu, Chenxin [1 ,2 ]
Tan, Robby T. [2 ]
Tan, Yuhong [1 ]
Chen, Siheng [1 ,3 ]
Wang, Xinchao [2 ]
Wang, Yanfeng [1 ,3 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Natl Univ Singapore, Singapore, Singapore
[3] Shanghai AI Lab, Shanghai, Peoples R China
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/ICCV51070.2023.00872
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Exploring spatial-temporal dependencies from observed motions is one of the core challenges of human motion prediction. Previous methods mainly focus on dedicated network structures to model the spatial and temporal dependencies. This paper considers a new direction by introducing a model learning framework with auxiliary tasks. In our auxiliary tasks, partial body joints' coordinates are corrupted by either masking or adding noise and the goal is to recover corrupted coordinates depending on the rest coordinates. To work with auxiliary tasks, we propose a novel auxiliary-adapted transformer, which can handle incomplete, corrupted motion data and achieve coordinate recovery via capturing spatial-temporal dependencies. Through auxiliary tasks, the auxiliary- adapted transformer is promoted to capture more comprehensive spatial-temporal dependencies among body joints' coordinates, leading to better feature learning. Extensive experimental results have shown that our method outperforms state- of-the-art methods by remarkable margins of 7.2%, 3.7%, and 9.4% in terms of 3D mean per joint position error (MPJPE) on the Human3.6M, CMU Mocap, and 3DPW datasets, respectively. We also demonstrate that our method is more robust under data missing cases and noisy data cases. Code is available at https: //github.com/MediaBrain-SJTU/AuxFormer.
引用
收藏
页码:9475 / 9486
页数:12
相关论文
共 50 条
  • [1] Skeleton Graph Scattering Networks for 3D Skeleton-based Human Motion Prediction
    Li, Maosen
    Chen, Siheng
    Liu, Zihui
    Zhang, Zijing
    Xie, Lingxi
    Tian, Qi
    Zhang, Ya
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 854 - 864
  • [2] Gradient multi-foci networks for 3D skeleton-based human motion prediction
    Shi J.
    Zhong J.
    He Z.
    Cao W.
    Neural Computing and Applications, 2024, 36 (24) : 14627 - 14642
  • [3] Dynamic Multiscale Graph Neural Networks for 3D Skeleton-Based Human Motion Prediction
    Li, Maosen
    Chen, Siheng
    Zhao, Yangheng
    Zhang, Ya
    Wang, Yanfeng
    Tian, Qi
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 211 - 220
  • [4] Symbiotic Graph Neural Networks for 3D Skeleton-Based Human Action Recognition and Motion Prediction
    Li, Maosen
    Chen, Siheng
    Chen, Xu
    Zhang, Ya
    Wang, Yanfeng
    Tian, Qi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (06) : 3316 - 3333
  • [5] 3D skeleton-based human motion prediction using spatial-temporal graph convolutional network
    Huang, Jianying
    Kang, Hoon
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2024, 13 (03)
  • [6] Skeleton-Based Human Motion Prediction With Privileged Supervision
    Dong, Minjing
    Xu, Chang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10419 - 10432
  • [7] 3-D Skeleton-Based Human Motion Prediction With Manifold-Aware GAN
    Chopin, Baptiste
    Otberdout, Naima
    Daoudi, Mohamed
    Bartolo, Angela
    IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, 2023, 5 (03): : 321 - 333
  • [8] Parallel multi-stage rectification networks for 3D skeleton-based motion prediction
    Zhong, Jianqi
    Ye, Conghui
    Cao, Wenming
    Wang, Hao
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [9] Dynamic Differencing-Based Hybrid Network for Improved 3D Skeleton-Based Motion Prediction
    Ji, Ruiya
    Lu, Chengjie
    Zhong, Jianqi
    AI, 2024, 5 (04) : 2897 - 2913
  • [10] Video-Based Motion Capturing for Skeleton-Based 3D Models
    Shih, Liang-Yu
    Chen, Bing-Yu
    Wu, Ja-Ling
    ADVANCES IN IMAGE AND VIDEO TECHNOLOGY, PROCEEDINGS, 2009, 5414 : 748 - 758