3D Human Mesh Reconstruction by Learning to Sample Joint Adaptive Tokens for Transformers

被引:6
|
作者
Xue, Youze [1 ]
Chen, Jiansheng [2 ]
Zhang, Yudong [1 ]
Yu, Cheng [1 ]
Ma, Huimin [2 ]
Ma, Hongbing [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Univ Sci & Technol, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
3D human pose estimation; vision transformers; learnable sampling;
D O I
10.1145/3503161.3548133
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Reconstructing 3D human mesh from a single RGB image is a challenging task due to the inherent depth ambiguity. Researchers commonly use convolutional neural networks to extract features and then apply spatial aggregation on the feature maps to explore the embedded 3D cues in the 2D image. Recently, two methods of spatial aggregation, the transformers and the spatial attention, are adopted to achieve the state-of-the-art performance, whereas they both have limitations. The use of transformers helps modelling long-term dependency across different joints whereas the grid tokens are not adaptive for the positions and shapes of human joints in different images. On the contrary, the spatial attention focuses on joint-specific features. However, the non-local information of the body is ignored by the concentrated attention maps. To address these issues, we propose a Learnable Sampling module to generate joint adaptive tokens and then use transformers to aggregate global information. Feature vectors are sampled accordingly from the feature maps to form the tokens of different joints. The sampling weights are predicted by a learnable network so that the model can learn to sample joint-related features adaptively. Our adaptive tokens are explicitly correlated with human joints, so that more effective modeling of global dependency among different human joints can be achieved. To validate the effectiveness of our method, we conduct experiments on several popular datasets including Human3.6M and 3DPW. Our method achieves lower reconstruction errors in terms of both the vertex-based metric and the joint-based metric compared to previous state of the arts. The codes and the trained models are released at https://github.com/thuxyz19/Learnable-Sampling.
引用
收藏
页码:6765 / 6773
页数:9
相关论文
共 50 条
  • [41] Deformable Mesh Transformer for 3D Human Mesh Recovery
    Yoshiyasu, Yusuke
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 17006 - 17015
  • [42] Study on 3D Model Reconstruction of Human Knee Joint Based on MRI
    Han, Yuemei
    MEASUREMENT TECHNOLOGY AND ENGINEERING RESEARCHES IN INDUSTRY, PTS 1-3, 2013, 333-335 : 934 - 937
  • [43] Adaptive 3d reconstruction of archaeological pottery
    Kampel, M
    Liska, C
    Tosovic, S
    MACHINE VISION APPLICATIONS IN INDUSTRIAL INSPECTION IX, 2001, 4301 : 42 - 51
  • [44] Domain Adaptive 3D Pose Augmentation for In-the-wild Human Mesh Recovery
    Weng, Zhenzhen
    Wang, Kuan-Chieh
    Kanazawa, Angjoo
    Yeung, Serena
    2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, : 261 - 270
  • [45] Learning Visibility Field for Detailed 3D Human Reconstruction and Relighting
    Zheng, Ruichen
    Li, Peng
    Wang, Haoqian
    Yu, Tao
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 216 - 226
  • [46] Enhancing single-view 3D mesh reconstruction with the aid of implicit surface learning
    Fahim, George
    Amin, Khalid
    Zarif, Sameh
    IMAGE AND VISION COMPUTING, 2022, 119
  • [47] Adaptive 3D mesh reconstruction from dense unorganized weighted points using neural network
    Yan, LM
    Yuan, YW
    Zeng, XH
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 3238 - 3242
  • [48] End-to-End Human Pose and Mesh Reconstruction with Transformers
    Lin, Kevin
    Wang, Lijuan
    Liu, Zicheng
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1954 - 1963
  • [49] Monocular 3D Face Reconstruction with Joint 2D and 3D Constraints
    Cui, Huili
    Yang, Jing
    Lai, Yu-Kun
    Li, Kun
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT I, 2022, 13604 : 129 - 141
  • [50] Regular 3D mesh reconstruction based on cylindrical mapping
    Khan, IR
    Okuda, M
    Takahashi, S
    2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 133 - 136