3D Human Mesh Reconstruction by Learning to Sample Joint Adaptive Tokens for Transformers

被引:6
|
作者
Xue, Youze [1 ]
Chen, Jiansheng [2 ]
Zhang, Yudong [1 ]
Yu, Cheng [1 ]
Ma, Huimin [2 ]
Ma, Hongbing [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Univ Sci & Technol, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
3D human pose estimation; vision transformers; learnable sampling;
D O I
10.1145/3503161.3548133
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Reconstructing 3D human mesh from a single RGB image is a challenging task due to the inherent depth ambiguity. Researchers commonly use convolutional neural networks to extract features and then apply spatial aggregation on the feature maps to explore the embedded 3D cues in the 2D image. Recently, two methods of spatial aggregation, the transformers and the spatial attention, are adopted to achieve the state-of-the-art performance, whereas they both have limitations. The use of transformers helps modelling long-term dependency across different joints whereas the grid tokens are not adaptive for the positions and shapes of human joints in different images. On the contrary, the spatial attention focuses on joint-specific features. However, the non-local information of the body is ignored by the concentrated attention maps. To address these issues, we propose a Learnable Sampling module to generate joint adaptive tokens and then use transformers to aggregate global information. Feature vectors are sampled accordingly from the feature maps to form the tokens of different joints. The sampling weights are predicted by a learnable network so that the model can learn to sample joint-related features adaptively. Our adaptive tokens are explicitly correlated with human joints, so that more effective modeling of global dependency among different human joints can be achieved. To validate the effectiveness of our method, we conduct experiments on several popular datasets including Human3.6M and 3DPW. Our method achieves lower reconstruction errors in terms of both the vertex-based metric and the joint-based metric compared to previous state of the arts. The codes and the trained models are released at https://github.com/thuxyz19/Learnable-Sampling.
引用
收藏
页码:6765 / 6773
页数:9
相关论文
共 50 条
  • [1] JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery
    Li, Jiahao
    Yang, Zongxin
    Wang, Xiaohan
    Ma, Jianxin
    Zhou, Chang
    Yang, Yi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9076 - 9087
  • [2] Multimodal Token Fusion and Optimization for 3D Human Mesh Reconstruction with Transformers
    Jiang, Yang
    Wang, Sunli
    Sun, Mingyang
    Kou, Dongliang
    Xie, Qiangbin
    Zhang, Lihuang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VI, 2025, 15036 : 593 - 605
  • [3] Adaptive mesh generation of MRI images for 3D reconstruction of human trunk
    Courchesne, O.
    Guibault, F.
    Dompierre, J.
    Cheriet, F.
    IMAGE ANALYSIS AND RECOGNITION, PROCEEDINGS, 2007, 4633 : 1040 - +
  • [4] Joint Reconstruction of Image and Motion in MRI: Implicit Regularization Using an Adaptive 3D Mesh
    Menini, Anne
    Vuissoz, Pierre-Andre
    Felblinger, Jacques
    Odille, Freddy
    MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION - MICCAI 2012, PT I, 2012, 7510 : 264 - 271
  • [5] Adaptive Joint Optimization for 3D Reconstruction With Differentiable Rendering
    Zhang, Jingbo
    Wan, Ziyu
    Liao, Jing
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2023, 29 (06) : 3039 - 3051
  • [6] Learning Human Mesh Recovery in 3D Scenes
    Shen, Zehong
    Cen, Zhi
    Peng, Sida
    Shuai, Qing
    Bao, Hujun
    Zhou, Xiaowei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 17038 - 17047
  • [7] Accurate 3D Face Reconstruction with Facial Component Tokens
    Zhang, Tianke
    Chu, Xuangeng
    Liu, Yunfei
    Lin, Lijian
    Yang, Zhendong
    Xu, Zhengzhuo
    Cao, Chengkun
    Yu, Fei
    Zhou, Changyin
    Yuan, Chun
    Li, Yu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8999 - 9008
  • [8] 3D adaptive mesh refinement
    Merrouche, A
    Selman, A
    Knopf-Lenoir, C
    COMMUNICATIONS IN NUMERICAL METHODS IN ENGINEERING, 1998, 14 (05): : 397 - 407
  • [9] An adaptive mesh model for 3D reconstruction from unorganized data points
    Hu, WQ
    Yang, WY
    Xiong, YL
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2005, 26 (11-12): : 1362 - 1369
  • [10] An adaptive mesh model for 3D reconstruction from unorganized data points
    W. Hu
    W. Yang
    Y. Xiong
    The International Journal of Advanced Manufacturing Technology, 2005, 26 : 1362 - 1369