Putting People in their Place: Monocular Regression of 3D People in Depth

被引:79
|
作者
Sun, Yu [1 ,2 ]
Liu, Wu [2 ]
Bao, Qian [2 ]
Fu, Yili [1 ]
Mei, Tao [2 ]
Black, Michael J. [3 ]
机构
[1] Harbin Inst Technol, Harbin, Peoples R China
[2] JD Com, Explore Acad, Beijing, Peoples R China
[3] Max Planck Inst Intelligent Syst, Tubingen, Germany
基金
国家重点研发计划;
关键词
D O I
10.1109/CVPR52688.2022.01289
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given an image with multiple people, our goal is to directly regress the pose and shape of all the people as well as their relative depth. Inferring the depth of a person in an image, however, is fundamentally ambiguous without knowing their height. This is particularly problematic when the scene contains people of very different sizes, e.g. from infants to adults. To solve this, we need several things. First, we develop a novel method to infer the poses and depth of multiple people in a single image. While previous work that estimates multiple people does so by reasoning in the image plane, our method, called BEV, adds an additional imaginary Bird's-Eye-View representation to explicitly reason about depth. BEV reasons simultaneously about body centers in the image and in depth and, by combing these, estimates 3D body position. Unlike prior work, BEV is a single-shot method that is end-to-end differentiable. Second, height varies with age, making it impossible to resolve depth without also estimating the age of people in the image. To do so, we exploit a 3D body model space that lets BEV infer shapes from infants to adults. Third, to train BEV, we need a new dataset. Specifically, we create a "Relative Human" (RH) dataset that includes age labels and relative depth relationships between the people in the images. Extensive experiments on RH and AGORA demonstrate the effectiveness of the model and training scheme. BEV outperforms existing methods on depth reasoning, child shape estimation, and robustness to occlusion. The code1 and dataset2 are released for research purposes.
引用
收藏
页码:13233 / 13242
页数:10
相关论文
共 50 条
  • [1] Monocular, One-stage, Regression of Multiple 3D People
    Sun, Yu
    Bao, Qian
    Liu, Wu
    Fu, Yili
    Black, Michael J.
    Mei, Tao
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11159 - 11168
  • [2] Putting people into place
    Entwisle, Barbara
    DEMOGRAPHY, 2007, 44 (04) : 687 - 703
  • [3] Monocular Expressive 3D Human Reconstruction of Multiple People
    Zhao, Zhenghao
    Tang, Hao
    Wan, Joy
    Yan, Yan
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 423 - 432
  • [4] Monocular 3D head tracking to detect falls of elderly people
    Rougoier, Caroline
    Meunier, Jean
    St-Arnaud, Alain
    Rousseau, Jacqueline
    2006 28TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-15, 2006, : 4881 - +
  • [5] Learning Monocular Regression of 3D People in Crowds via Scene-Aware Blending and De-Occlusion
    Sun, Yu
    Xu, Lubing
    Bao, Qian
    Liu, Wu
    Gao, Wenpeng
    Fu, Yili
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2289 - 2302
  • [6] Evaluation of the monocular depth cue in 3D displays
    Kim, Sung-Kyu
    Kim, Dong-Wook
    Kwon, Yong Moo
    Son, Jung-Young
    OPTICS EXPRESS, 2008, 16 (26): : 21415 - 21422
  • [7] Real-time Monocular 3D People Localization and Tracking on Embedded System
    Zhu, Yipeng
    Wang, Tao
    Zhu, Shiqiang
    2021 6TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2021), 2021, : 797 - 802
  • [8] Generating 3D People in Scenes without People
    Zhang, Yan
    Hassan, Mohamed
    Neumann, Heiko
    Black, Michael J.
    Tang, Siyu
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6193 - 6203
  • [9] 3D depth image analysis for indoor fall detection of elderly people
    Lei Yang
    Yanyun Ren
    Wenqiang Zhang
    Digital Communications and Networks, 2016, 2 (01) : 24 - 34
  • [10] 3D depth image analysis for indoor fall detection of elderly people
    Yang, Lei
    Ren, Yanyun
    Zhang, Wenqiang
    DIGITAL COMMUNICATIONS AND NETWORKS, 2016, 2 (01) : 24 - 34