Putting People in their Place: Monocular Regression of 3D People in Depth

被引:79
|
作者
Sun, Yu [1 ,2 ]
Liu, Wu [2 ]
Bao, Qian [2 ]
Fu, Yili [1 ]
Mei, Tao [2 ]
Black, Michael J. [3 ]
机构
[1] Harbin Inst Technol, Harbin, Peoples R China
[2] JD Com, Explore Acad, Beijing, Peoples R China
[3] Max Planck Inst Intelligent Syst, Tubingen, Germany
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2022年
基金
国家重点研发计划;
关键词
D O I
10.1109/CVPR52688.2022.01289
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given an image with multiple people, our goal is to directly regress the pose and shape of all the people as well as their relative depth. Inferring the depth of a person in an image, however, is fundamentally ambiguous without knowing their height. This is particularly problematic when the scene contains people of very different sizes, e.g. from infants to adults. To solve this, we need several things. First, we develop a novel method to infer the poses and depth of multiple people in a single image. While previous work that estimates multiple people does so by reasoning in the image plane, our method, called BEV, adds an additional imaginary Bird's-Eye-View representation to explicitly reason about depth. BEV reasons simultaneously about body centers in the image and in depth and, by combing these, estimates 3D body position. Unlike prior work, BEV is a single-shot method that is end-to-end differentiable. Second, height varies with age, making it impossible to resolve depth without also estimating the age of people in the image. To do so, we exploit a 3D body model space that lets BEV infer shapes from infants to adults. Third, to train BEV, we need a new dataset. Specifically, we create a "Relative Human" (RH) dataset that includes age labels and relative depth relationships between the people in the images. Extensive experiments on RH and AGORA demonstrate the effectiveness of the model and training scheme. BEV outperforms existing methods on depth reasoning, child shape estimation, and robustness to occlusion. The code1 and dataset2 are released for research purposes.
引用
收藏
页码:13233 / 13242
页数:10
相关论文
共 50 条
  • [31] The use of consumer depth cameras for 3D surface imaging of people with obesity: A feasibility study
    Wheat, J. S.
    Clarkson, S.
    Flint, S. W.
    Simpson, C.
    Broom, D. R.
    OBESITY RESEARCH & CLINICAL PRACTICE, 2018, 12 (06) : 528 - 533
  • [32] Multimodal People Re-Identification Using 3D Skeleton, Depth, and Color Information
    Patruno, Cosimo
    Reno, Vito
    Cicirelli, Grazia
    D'Orazio, Tiziana
    IEEE ACCESS, 2024, 12 : 174689 - 174704
  • [33] Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes The Importance of Multiple Scene Constraints
    Zanfir, Andrei
    Marinoiu, Elisabeta
    Sminchisescu, Cristian
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2148 - 2157
  • [34] eGAC3D: enhancing depth adaptive convolution and depth estimation for monocular 3D object pose detection
    Ngo, Duc Tuan
    Bui, Minh-Quan Viet
    Nguyen, Duc Dung
    Pham, Hoang-Anh
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [35] Monocular 3D object detection with thermodynamic loss and decoupled instance depth
    Liu, Gang
    Xie, Xiaoxiao
    Yu, Qingchen
    CONNECTION SCIENCE, 2024, 36 (01)
  • [36] MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
    Huang, Kuan-Chih
    Wu, Tsung-Han
    Su, Hung-Ting
    Hsu, Winston H.
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4002 - 4011
  • [37] RGB-Fusion: Monocular 3D reconstruction with learned depth prediction
    Duan, ZhiMin
    Chen, YingWen
    Yu, HuJie
    Hu, BoWen
    Chen, Chen
    DISPLAYS, 2021, 70
  • [38] MonoDFNet: Monocular 3D Object Detection with Depth Fusion and Adaptive Optimization
    Gao, Yuhan
    Wang, Peng
    Li, Xiaoyan
    Sun, Mengyu
    Di, Ruohai
    Li, Liangliang
    Hong, Wei
    SENSORS, 2025, 25 (03)
  • [39] Exploiting Ground Depth Estimation for Mobile Monocular 3D Object Detection
    Zhou, Yunsong
    Liu, Quan
    Zhu, Hongzi
    Li, Yunzhe
    Chang, Shan
    Guo, Minyi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 3079 - 3093
  • [40] DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection
    Peng, Liang
    Wu, Xiaopei
    Yang, Zheng
    Liu, Haifeng
    Cai, Deng
    COMPUTER VISION - ECCV 2022, PT I, 2022, 13661 : 71 - 88