3D Human Mesh Reconstruction by Learning to Sample Joint Adaptive Tokens for Transformers

被引：6

作者：

Xue, Youze ^{[1
]}

Chen, Jiansheng ^{[2
]}

Zhang, Yudong ^{[1
]}

Yu, Cheng ^{[1
]}

Ma, Huimin ^{[2
]}

Ma, Hongbing ^{[1
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] Univ Sci & Technol, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

3D human pose estimation; vision transformers; learnable sampling;

D O I：

10.1145/3503161.3548133

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Reconstructing 3D human mesh from a single RGB image is a challenging task due to the inherent depth ambiguity. Researchers commonly use convolutional neural networks to extract features and then apply spatial aggregation on the feature maps to explore the embedded 3D cues in the 2D image. Recently, two methods of spatial aggregation, the transformers and the spatial attention, are adopted to achieve the state-of-the-art performance, whereas they both have limitations. The use of transformers helps modelling long-term dependency across different joints whereas the grid tokens are not adaptive for the positions and shapes of human joints in different images. On the contrary, the spatial attention focuses on joint-specific features. However, the non-local information of the body is ignored by the concentrated attention maps. To address these issues, we propose a Learnable Sampling module to generate joint adaptive tokens and then use transformers to aggregate global information. Feature vectors are sampled accordingly from the feature maps to form the tokens of different joints. The sampling weights are predicted by a learnable network so that the model can learn to sample joint-related features adaptively. Our adaptive tokens are explicitly correlated with human joints, so that more effective modeling of global dependency among different human joints can be achieved. To validate the effectiveness of our method, we conduct experiments on several popular datasets including Human3.6M and 3DPW. Our method achieves lower reconstruction errors in terms of both the vertex-based metric and the joint-based metric compared to previous state of the arts. The codes and the trained models are released at https://github.com/thuxyz19/Learnable-Sampling.

引用

页码：6765 / 6773

页数：9

共 50 条

[41] Deformable Mesh Transformer for 3D Human Mesh Recovery
Yoshiyasu, Yusuke
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 17006 - 17015
[42] Study on 3D Model Reconstruction of Human Knee Joint Based on MRI
Han, Yuemei
MEASUREMENT TECHNOLOGY AND ENGINEERING RESEARCHES IN INDUSTRY, PTS 1-3, 2013, 333-335 : 934 - 937
[43] Adaptive 3d reconstruction of archaeological pottery
Kampel, M
Liska, C
Tosovic, S
MACHINE VISION APPLICATIONS IN INDUSTRIAL INSPECTION IX, 2001, 4301 : 42 - 51
[44] Domain Adaptive 3D Pose Augmentation for In-the-wild Human Mesh Recovery
Weng, Zhenzhen
Wang, Kuan-Chieh
Kanazawa, Angjoo
Yeung, Serena
2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, : 261 - 270
[45] Learning Visibility Field for Detailed 3D Human Reconstruction and Relighting
Zheng, Ruichen
Li, Peng
Wang, Haoqian
Yu, Tao
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 216 - 226
[46] Enhancing single-view 3D mesh reconstruction with the aid of implicit surface learning
Fahim, George
Amin, Khalid
Zarif, Sameh
IMAGE AND VISION COMPUTING, 2022, 119
[47] Adaptive 3D mesh reconstruction from dense unorganized weighted points using neural network
Yan, LM
Yuan, YW
Zeng, XH
PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 3238 - 3242
[48] End-to-End Human Pose and Mesh Reconstruction with Transformers
Lin, Kevin
Wang, Lijuan
Liu, Zicheng
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1954 - 1963
[49] Monocular 3D Face Reconstruction with Joint 2D and 3D Constraints
Cui, Huili
Yang, Jing
Lai, Yu-Kun
Li, Kun
ARTIFICIAL INTELLIGENCE, CICAI 2022, PT I, 2022, 13604 : 129 - 141
[50] Regular 3D mesh reconstruction based on cylindrical mapping
Khan, IR
Okuda, M
Takahashi, S
2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 133 - 136

← 1 2 3 4 5 →