One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer

被引:26
|
作者
Lin, Jing [1 ,2 ]
Zeng, Ailing [1 ]
Wang, Haoqian [2 ]
Zhang, Lei [1 ]
Li, Yu [1 ]
机构
[1] Int Digital Econ Acad IDEA, Sehnzhen, Peoples R China
[2] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen, Peoples R China
关键词
D O I
10.1109/CVPR52729.2023.02027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Whole-body mesh recovery aims to estimate the 3D human body, face, and hands parameters from a single image. It is challenging to perform this task with a single network due to resolution issues, i.e., the face and hands are usually located in extremely small regions. Existing works usually detect hands and faces, enlarge their resolution to feed in a specific network to predict the parameter, and finally fuse the results. While this copy-paste pipeline can capture the fine-grained details of the face and hands, the connections between different parts cannot be easily recovered in late fusion, leading to implausible 3D rotation and unnatural pose. In this work, we propose a one-stage pipeline for expressive whole-body mesh recovery, named OSX, without separate networks for each part. Specifically, we design a Component Aware Transformer (CAT) composed of a global body encoder and a local face/hand decoder. The encoder predicts the body parameters and provides a high-quality feature map for the decoder, which performs a feature-level upsample-crop scheme to extract highresolution part-specific features and adopt keypoint-guided deformable attention to estimate hand and face precisely. The whole pipeline is simple yet effective without any manual post-processing and naturally avoids implausible prediction. Comprehensive experiments demonstrate the effectiveness of OSX. Lastly, we build a large-scale Upper-Body dataset (UBody) with high-quality 2D and 3D whole-body annotations. It contains persons with partially visible bodies in diverse real-life scenarios to bridge the gap between the basic task and downstream applications.
引用
收藏
页码:21159 / 21168
页数:10
相关论文
共 50 条
  • [1] Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation
    Moon, Gyeongsik
    Choi, Hongsuk
    Lee, Kyoung Mu
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2307 - 2316
  • [2] Deformable Mesh Transformer for 3D Human Mesh Recovery
    Yoshiyasu, Yusuke
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 17006 - 17015
  • [3] 3D Whole-Body MRI of the Musculoskeletal System
    Pasoglou, Vassiliki
    Van Nieuwenhove, Sandy
    Peeters, Frank
    Duchene, Gaetan
    Kirchgesner, Thomas
    Lecouvet, Frederic E.
    SEMINARS IN MUSCULOSKELETAL RADIOLOGY, 2021, 25 (03) : 441 - 454
  • [4] Expressive Whole-Body 3D Gaussian Avatar
    Moon, Gyeongsik
    Shiratori, Takaaki
    Saito, Shunsuke
    COMPUTER VISION - ECCV 2024, PT XLI, 2025, 15099 : 19 - 35
  • [5] KAMA: 3D Keypoint Aware Body Mesh Articulation
    Iqbal, Umar
    Xie, Kevin
    Guo, Yunrong
    Kautz, Jan
    Molchanov, Pavlo
    2021 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2021), 2021, : 689 - 699
  • [6] Monocular, One-stage, Regression of Multiple 3D People
    Sun, Yu
    Bao, Qian
    Liu, Wu
    Fu, Yili
    Black, Michael J.
    Mei, Tao
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11159 - 11168
  • [7] Expressive Forecasting of 3D Whole-Body Human Motions
    Ding, Pengxiang
    Cui, Qiongjie
    Wang, Haofan
    Zhang, Min
    Liu, Mengyuan
    Wang, Donglin
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1537 - 1545
  • [8] Whole-body 3D MR angiography in 72 seconds
    Herborn, CU
    Goyen, M
    Bosk, S
    Kroeger, K
    Debatin, JF
    Ruehm, SG
    RADIOLOGY, 2001, 221 : 264 - 264
  • [9] Whole-body 3D scanner and scan data report
    Addleman, S
    THREE-DIMENSIONAL IMAGE CAPTURE, 1997, 3023 : 2 - 5
  • [10] SimpleMeshNet: end to end recovery of 3d body mesh with one fully connected layer
    Wenzhang Sun
    Shaopeng Ma
    Xuanfang He
    Qinwei Ma
    Journal of Real-Time Image Processing, 2022, 19 : 703 - 713