3D Human Mesh Reconstruction by Learning to Sample Joint Adaptive Tokens for Transformers

被引:6
|
作者
Xue, Youze [1 ]
Chen, Jiansheng [2 ]
Zhang, Yudong [1 ]
Yu, Cheng [1 ]
Ma, Huimin [2 ]
Ma, Hongbing [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Univ Sci & Technol, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
3D human pose estimation; vision transformers; learnable sampling;
D O I
10.1145/3503161.3548133
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Reconstructing 3D human mesh from a single RGB image is a challenging task due to the inherent depth ambiguity. Researchers commonly use convolutional neural networks to extract features and then apply spatial aggregation on the feature maps to explore the embedded 3D cues in the 2D image. Recently, two methods of spatial aggregation, the transformers and the spatial attention, are adopted to achieve the state-of-the-art performance, whereas they both have limitations. The use of transformers helps modelling long-term dependency across different joints whereas the grid tokens are not adaptive for the positions and shapes of human joints in different images. On the contrary, the spatial attention focuses on joint-specific features. However, the non-local information of the body is ignored by the concentrated attention maps. To address these issues, we propose a Learnable Sampling module to generate joint adaptive tokens and then use transformers to aggregate global information. Feature vectors are sampled accordingly from the feature maps to form the tokens of different joints. The sampling weights are predicted by a learnable network so that the model can learn to sample joint-related features adaptively. Our adaptive tokens are explicitly correlated with human joints, so that more effective modeling of global dependency among different human joints can be achieved. To validate the effectiveness of our method, we conduct experiments on several popular datasets including Human3.6M and 3DPW. Our method achieves lower reconstruction errors in terms of both the vertex-based metric and the joint-based metric compared to previous state of the arts. The codes and the trained models are released at https://github.com/thuxyz19/Learnable-Sampling.
引用
收藏
页码:6765 / 6773
页数:9
相关论文
共 50 条
  • [21] 3D Reconstruction of human bones based on dictionary learning
    Zhang, Binkai
    Wang, Xiang
    Liang, Xiao
    Zheng, Jinjin
    MEDICAL ENGINEERING & PHYSICS, 2017, 49 : 163 - 170
  • [22] Adaptive evolution strategy sample consensus for 3D reconstruction from two cameras
    Toda, Yuichiro
    Yz, Hsu Horng
    Matsuno, Takayuki
    Minami, Mamoru
    Zhou, Dalin
    ARTIFICIAL LIFE AND ROBOTICS, 2020, 25 (03) : 466 - 474
  • [23] Adaptive evolution strategy sample consensus for 3D reconstruction from two cameras
    Yuichiro Toda
    Hsu Horng Yz
    Takayuki Matsuno
    Mamoru Minami
    Dalin Zhou
    Artificial Life and Robotics, 2020, 25 : 466 - 474
  • [24] Learning Reconstruction Models of Textured 3D Mesh Using StyleGAN2
    Wang, Fei
    Cao, Yangjie
    Li, Zhenqiang
    Li, Jie
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT II, ICIC 2024, 2024, 14876 : 416 - 427
  • [25] Image2Mesh: A Learning Framework for Single Image 3D Reconstruction
    Pontes, Jhony K.
    Kong, Chen
    Sridharan, Sridha
    Lucey, Simon
    Eriksson, Anders
    Fookes, Clinton
    COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 365 - 381
  • [26] LIST: Learning Implicitly from Spatial Transformers for Single-View 3D Reconstruction
    Arshad, Mohammad Samiul
    Beksi, William J.
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9287 - 9296
  • [27] Surface Reconstruction: 3D Mesh Filtering with Feature Preserving Bi-Adaptive Algorithms
    Kraemer, Pierre
    Fournier, Marc
    Bechmann, Dominique
    SITIS 2008: 4TH INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGY AND INTERNET BASED SYSTEMS, PROCEEDINGS, 2008, : 466 - 473
  • [28] Learning 3D Mesh Segmentation and Labeling
    Kalogerakis, Evangelos
    Hertzmann, Aaron
    Singh, Karan
    ACM TRANSACTIONS ON GRAPHICS, 2010, 29 (04):
  • [29] Learning to Predict 3D Mesh Saliency
    ALfarasani, Dalia A.
    Sweetman, Thomas
    Lai, Yu-Kun
    Rosin, Paul L.
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4023 - 4029
  • [30] DASI: Learning Domain Adaptive Shape Impression for 3D Object Reconstruction
    Gao, Junna
    Kong, Dehui
    Wang, Shaofan
    Li, Jinghua
    Yin, Baocai
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5248 - 5262