3D Human Mesh Reconstruction by Learning to Sample Joint Adaptive Tokens for Transformers

被引：6

作者：

Xue, Youze ^{[1
]}

Chen, Jiansheng ^{[2
]}

Zhang, Yudong ^{[1
]}

Yu, Cheng ^{[1
]}

Ma, Huimin ^{[2
]}

Ma, Hongbing ^{[1
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] Univ Sci & Technol, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

3D human pose estimation; vision transformers; learnable sampling;

D O I：

10.1145/3503161.3548133

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Reconstructing 3D human mesh from a single RGB image is a challenging task due to the inherent depth ambiguity. Researchers commonly use convolutional neural networks to extract features and then apply spatial aggregation on the feature maps to explore the embedded 3D cues in the 2D image. Recently, two methods of spatial aggregation, the transformers and the spatial attention, are adopted to achieve the state-of-the-art performance, whereas they both have limitations. The use of transformers helps modelling long-term dependency across different joints whereas the grid tokens are not adaptive for the positions and shapes of human joints in different images. On the contrary, the spatial attention focuses on joint-specific features. However, the non-local information of the body is ignored by the concentrated attention maps. To address these issues, we propose a Learnable Sampling module to generate joint adaptive tokens and then use transformers to aggregate global information. Feature vectors are sampled accordingly from the feature maps to form the tokens of different joints. The sampling weights are predicted by a learnable network so that the model can learn to sample joint-related features adaptively. Our adaptive tokens are explicitly correlated with human joints, so that more effective modeling of global dependency among different human joints can be achieved. To validate the effectiveness of our method, we conduct experiments on several popular datasets including Human3.6M and 3DPW. Our method achieves lower reconstruction errors in terms of both the vertex-based metric and the joint-based metric compared to previous state of the arts. The codes and the trained models are released at https://github.com/thuxyz19/Learnable-Sampling.

引用

页码：6765 / 6773

页数：9

共 50 条

[31] Delaunay triangulation and 3D adaptive mesh generation
Golias, N. A.
Dutton, R. W.
Journal of Raman Spectroscopy, 28 (04):
[32] An adaptive 3D surface mesh cutting operation
Huynh Quang Huy Viet
Kamada, Takahiro
Tanaka, Hiromi T.
ARTICULATED MOTION AND DEFORMABLE OBJECTS, PROCEEDINGS, 2006, 4069 : 366 - 374
[33] Delaunay triangulation and 3D adaptive mesh generation
Golias, NA
Dutton, RW
FINITE ELEMENTS IN ANALYSIS AND DESIGN, 1997, 25 (3-4) : 331 - 341
[34] ADAPTIVE AUTHENTICATION SCHEMES FOR 3D MESH MODELS
Chen, Te-Yu
Hwang, Min-Shiang
Jan, Jinn-Ke
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2009, 5 (12A): : 4561 - 4572
[35] Adaptive authentication schemes for 3D mesh models
Chen, Te-Yu
Hwang, Min-Shiang
Jan, Jinn-Ke
International Journal of Innovative Computing, Information and Control, 2009, 5 (12): : 4561 - 4572
[36] Learning stratified 3D reconstruction
Dong, Qiulei
Shu, Mao
Cui, Hainan
Xu, Huarong
Hu, Zhanyi
SCIENCE CHINA-INFORMATION SCIENCES, 2018, 61 (02)
[37] Learning stratified 3D reconstruction
Qiulei DONG
Mao SHU
Hainan CUI
Huarong XU
Zhanyi HU
Science China(Information Sciences), 2018, 61 (02) : 224 - 239
[38] Compression of 3D mesh sequences based on an adaptive 3D wavelet transformation
Jafari, Kian
Dupont, Florent
THREE-DIMENSIONAL IMAGE PROCESSING (3DIP) AND APPLICATIONS, 2010, 7526
[39] Learning stratified 3D reconstruction
Qiulei Dong
Mao Shu
Hainan Cui
Huarong Xu
Zhanyi Hu
Science China Information Sciences, 2018, 61
[40] Deep learning for 3D human pose estimation and mesh recovery: A survey
Liu, Yang
Qiu, Changzhen
Zhang, Zhiyong
NEUROCOMPUTING, 2024, 596

← 1 2 3 4 5 →