3D Human Mesh Reconstruction by Learning to Sample Joint Adaptive Tokens for Transformers

被引:6
|
作者
Xue, Youze [1 ]
Chen, Jiansheng [2 ]
Zhang, Yudong [1 ]
Yu, Cheng [1 ]
Ma, Huimin [2 ]
Ma, Hongbing [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Univ Sci & Technol, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
3D human pose estimation; vision transformers; learnable sampling;
D O I
10.1145/3503161.3548133
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Reconstructing 3D human mesh from a single RGB image is a challenging task due to the inherent depth ambiguity. Researchers commonly use convolutional neural networks to extract features and then apply spatial aggregation on the feature maps to explore the embedded 3D cues in the 2D image. Recently, two methods of spatial aggregation, the transformers and the spatial attention, are adopted to achieve the state-of-the-art performance, whereas they both have limitations. The use of transformers helps modelling long-term dependency across different joints whereas the grid tokens are not adaptive for the positions and shapes of human joints in different images. On the contrary, the spatial attention focuses on joint-specific features. However, the non-local information of the body is ignored by the concentrated attention maps. To address these issues, we propose a Learnable Sampling module to generate joint adaptive tokens and then use transformers to aggregate global information. Feature vectors are sampled accordingly from the feature maps to form the tokens of different joints. The sampling weights are predicted by a learnable network so that the model can learn to sample joint-related features adaptively. Our adaptive tokens are explicitly correlated with human joints, so that more effective modeling of global dependency among different human joints can be achieved. To validate the effectiveness of our method, we conduct experiments on several popular datasets including Human3.6M and 3DPW. Our method achieves lower reconstruction errors in terms of both the vertex-based metric and the joint-based metric compared to previous state of the arts. The codes and the trained models are released at https://github.com/thuxyz19/Learnable-Sampling.
引用
收藏
页码:6765 / 6773
页数:9
相关论文
共 50 条
  • [31] Delaunay triangulation and 3D adaptive mesh generation
    Golias, N. A.
    Dutton, R. W.
    Journal of Raman Spectroscopy, 28 (04):
  • [32] An adaptive 3D surface mesh cutting operation
    Huynh Quang Huy Viet
    Kamada, Takahiro
    Tanaka, Hiromi T.
    ARTICULATED MOTION AND DEFORMABLE OBJECTS, PROCEEDINGS, 2006, 4069 : 366 - 374
  • [33] Delaunay triangulation and 3D adaptive mesh generation
    Golias, NA
    Dutton, RW
    FINITE ELEMENTS IN ANALYSIS AND DESIGN, 1997, 25 (3-4) : 331 - 341
  • [34] ADAPTIVE AUTHENTICATION SCHEMES FOR 3D MESH MODELS
    Chen, Te-Yu
    Hwang, Min-Shiang
    Jan, Jinn-Ke
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2009, 5 (12A): : 4561 - 4572
  • [35] Adaptive authentication schemes for 3D mesh models
    Chen, Te-Yu
    Hwang, Min-Shiang
    Jan, Jinn-Ke
    International Journal of Innovative Computing, Information and Control, 2009, 5 (12): : 4561 - 4572
  • [36] Learning stratified 3D reconstruction
    Dong, Qiulei
    Shu, Mao
    Cui, Hainan
    Xu, Huarong
    Hu, Zhanyi
    SCIENCE CHINA-INFORMATION SCIENCES, 2018, 61 (02)
  • [37] Learning stratified 3D reconstruction
    Qiulei DONG
    Mao SHU
    Hainan CUI
    Huarong XU
    Zhanyi HU
    Science China(Information Sciences), 2018, 61 (02) : 224 - 239
  • [38] Compression of 3D mesh sequences based on an adaptive 3D wavelet transformation
    Jafari, Kian
    Dupont, Florent
    THREE-DIMENSIONAL IMAGE PROCESSING (3DIP) AND APPLICATIONS, 2010, 7526
  • [39] Learning stratified 3D reconstruction
    Qiulei Dong
    Mao Shu
    Hainan Cui
    Huarong Xu
    Zhanyi Hu
    Science China Information Sciences, 2018, 61
  • [40] Deep learning for 3D human pose estimation and mesh recovery: A survey
    Liu, Yang
    Qiu, Changzhen
    Zhang, Zhiyong
    NEUROCOMPUTING, 2024, 596