3D Human Mesh Reconstruction by Learning to Sample Joint Adaptive Tokens for Transformers

被引：6

作者：

Xue, Youze ^{[1
]}

Chen, Jiansheng ^{[2
]}

Zhang, Yudong ^{[1
]}

Yu, Cheng ^{[1
]}

Ma, Huimin ^{[2
]}

Ma, Hongbing ^{[1
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] Univ Sci & Technol, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

3D human pose estimation; vision transformers; learnable sampling;

D O I：

10.1145/3503161.3548133

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Reconstructing 3D human mesh from a single RGB image is a challenging task due to the inherent depth ambiguity. Researchers commonly use convolutional neural networks to extract features and then apply spatial aggregation on the feature maps to explore the embedded 3D cues in the 2D image. Recently, two methods of spatial aggregation, the transformers and the spatial attention, are adopted to achieve the state-of-the-art performance, whereas they both have limitations. The use of transformers helps modelling long-term dependency across different joints whereas the grid tokens are not adaptive for the positions and shapes of human joints in different images. On the contrary, the spatial attention focuses on joint-specific features. However, the non-local information of the body is ignored by the concentrated attention maps. To address these issues, we propose a Learnable Sampling module to generate joint adaptive tokens and then use transformers to aggregate global information. Feature vectors are sampled accordingly from the feature maps to form the tokens of different joints. The sampling weights are predicted by a learnable network so that the model can learn to sample joint-related features adaptively. Our adaptive tokens are explicitly correlated with human joints, so that more effective modeling of global dependency among different human joints can be achieved. To validate the effectiveness of our method, we conduct experiments on several popular datasets including Human3.6M and 3DPW. Our method achieves lower reconstruction errors in terms of both the vertex-based metric and the joint-based metric compared to previous state of the arts. The codes and the trained models are released at https://github.com/thuxyz19/Learnable-Sampling.

引用

页码：6765 / 6773

页数：9

共 50 条

[21] 3D Reconstruction of human bones based on dictionary learning
Zhang, Binkai
Wang, Xiang
Liang, Xiao
Zheng, Jinjin
MEDICAL ENGINEERING & PHYSICS, 2017, 49 : 163 - 170
[22] Adaptive evolution strategy sample consensus for 3D reconstruction from two cameras
Toda, Yuichiro
Yz, Hsu Horng
Matsuno, Takayuki
Minami, Mamoru
Zhou, Dalin
ARTIFICIAL LIFE AND ROBOTICS, 2020, 25 (03) : 466 - 474
[23] Adaptive evolution strategy sample consensus for 3D reconstruction from two cameras
Yuichiro Toda
Hsu Horng Yz
Takayuki Matsuno
Mamoru Minami
Dalin Zhou
Artificial Life and Robotics, 2020, 25 : 466 - 474
[24] Learning Reconstruction Models of Textured 3D Mesh Using StyleGAN2
Wang, Fei
Cao, Yangjie
Li, Zhenqiang
Li, Jie
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT II, ICIC 2024, 2024, 14876 : 416 - 427
[25] Image2Mesh: A Learning Framework for Single Image 3D Reconstruction
Pontes, Jhony K.
Kong, Chen
Sridharan, Sridha
Lucey, Simon
Eriksson, Anders
Fookes, Clinton
COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 365 - 381
[26] LIST: Learning Implicitly from Spatial Transformers for Single-View 3D Reconstruction
Arshad, Mohammad Samiul
Beksi, William J.
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9287 - 9296
[27] Surface Reconstruction: 3D Mesh Filtering with Feature Preserving Bi-Adaptive Algorithms
Kraemer, Pierre
Fournier, Marc
Bechmann, Dominique
SITIS 2008: 4TH INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGY AND INTERNET BASED SYSTEMS, PROCEEDINGS, 2008, : 466 - 473
[28] Learning 3D Mesh Segmentation and Labeling
Kalogerakis, Evangelos
Hertzmann, Aaron
Singh, Karan
ACM TRANSACTIONS ON GRAPHICS, 2010, 29 (04):
[29] Learning to Predict 3D Mesh Saliency
ALfarasani, Dalia A.
Sweetman, Thomas
Lai, Yu-Kun
Rosin, Paul L.
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4023 - 4029
[30] DASI: Learning Domain Adaptive Shape Impression for 3D Object Reconstruction
Gao, Junna
Kong, Dehui
Wang, Shaofan
Li, Jinghua
Yin, Baocai
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5248 - 5262

← 1 2 3 4 5 →