Putting People in their Place: Monocular Regression of 3D People in Depth

被引:79
|
作者
Sun, Yu [1 ,2 ]
Liu, Wu [2 ]
Bao, Qian [2 ]
Fu, Yili [1 ]
Mei, Tao [2 ]
Black, Michael J. [3 ]
机构
[1] Harbin Inst Technol, Harbin, Peoples R China
[2] JD Com, Explore Acad, Beijing, Peoples R China
[3] Max Planck Inst Intelligent Syst, Tubingen, Germany
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2022年
基金
国家重点研发计划;
关键词
D O I
10.1109/CVPR52688.2022.01289
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given an image with multiple people, our goal is to directly regress the pose and shape of all the people as well as their relative depth. Inferring the depth of a person in an image, however, is fundamentally ambiguous without knowing their height. This is particularly problematic when the scene contains people of very different sizes, e.g. from infants to adults. To solve this, we need several things. First, we develop a novel method to infer the poses and depth of multiple people in a single image. While previous work that estimates multiple people does so by reasoning in the image plane, our method, called BEV, adds an additional imaginary Bird's-Eye-View representation to explicitly reason about depth. BEV reasons simultaneously about body centers in the image and in depth and, by combing these, estimates 3D body position. Unlike prior work, BEV is a single-shot method that is end-to-end differentiable. Second, height varies with age, making it impossible to resolve depth without also estimating the age of people in the image. To do so, we exploit a 3D body model space that lets BEV infer shapes from infants to adults. Third, to train BEV, we need a new dataset. Specifically, we create a "Relative Human" (RH) dataset that includes age labels and relative depth relationships between the people in the images. Extensive experiments on RH and AGORA demonstrate the effectiveness of the model and training scheme. BEV outperforms existing methods on depth reasoning, child shape estimation, and robustness to occlusion. The code1 and dataset2 are released for research purposes.
引用
收藏
页码:13233 / 13242
页数:10
相关论文
共 50 条
  • [11] Monocular tracking 3D people by Gaussian process spatio-temporal variable model
    Pang, Junbiao
    Qing, Laiyun
    Huang, Qingming
    Jiang, Shuqiang
    Gao, Wen
    2007 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-7, 2007, : 2293 - +
  • [12] Adversarial 3D Objects Against Monocular Depth Estimators
    Feher, Tamas Mark
    Szemenyei, Marton
    PROCEEDINGS OF THE 2024 9TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2024, 2024, : 138 - 142
  • [13] Monocular 3D Object Detection with Depth from Motion
    Wang, Tai
    Pang, Jiangmiao
    Lin, Dahua
    COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 386 - 403
  • [14] Depth Is All You Need for Monocular 3D Detection
    Park, Dennis
    Li, Jie
    Chen, Dian
    Guizilini, Vitor
    Gaidon, Adrien
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 7024 - 7031
  • [15] Monocular Depth Prediction through Continuous 3D Loss
    Zhu, Minghan
    Ghaffari, Maani
    Zhong, Yuanxin
    Lu, Pingping
    Cao, Zhong
    Eustice, Ryan M.
    Peng, Huei
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 10742 - 10749
  • [16] Tracking People with 3D Representations
    Rajasegaran, Jathushan
    Pavlakos, Georgios
    Kanazawa, Angjoo
    Malik, Jitendra
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [17] Tracking people with 3D representations
    Rajasegaran, Jathushan
    Pavlakos, Georgios
    Kanazawa, Angjoo
    Malik, Jitendra
    arXiv, 2021,
  • [18] Real-time People Detection and Tracking using 3D Depth Estimation
    Guizi, Fabiana da Silva
    Kurashima, Celso Setsuo
    2016 IEEE INTERNATIONAL SYMPOSIUM ON CONSUMER ELECTRONICS - 20TH IEEE ISCE, 2016, : 39 - 40
  • [19] People occupancy detection and profiling with 3D depth sensors for building energy management
    Diraco, Giovanni
    Leone, Alessandro
    Siciliano, Pietro
    ENERGY AND BUILDINGS, 2015, 92 : 246 - 266
  • [20] Monocular 3D Human Pose Estimation by Predicting Depth on Joints
    Nie, Bruce Xiaohan
    Wei, Ping
    Zhu, Song-Chun
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3467 - 3475