Putting People in their Place: Monocular Regression of 3D People in Depth

被引:79
|
作者
Sun, Yu [1 ,2 ]
Liu, Wu [2 ]
Bao, Qian [2 ]
Fu, Yili [1 ]
Mei, Tao [2 ]
Black, Michael J. [3 ]
机构
[1] Harbin Inst Technol, Harbin, Peoples R China
[2] JD Com, Explore Acad, Beijing, Peoples R China
[3] Max Planck Inst Intelligent Syst, Tubingen, Germany
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2022年
基金
国家重点研发计划;
关键词
D O I
10.1109/CVPR52688.2022.01289
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given an image with multiple people, our goal is to directly regress the pose and shape of all the people as well as their relative depth. Inferring the depth of a person in an image, however, is fundamentally ambiguous without knowing their height. This is particularly problematic when the scene contains people of very different sizes, e.g. from infants to adults. To solve this, we need several things. First, we develop a novel method to infer the poses and depth of multiple people in a single image. While previous work that estimates multiple people does so by reasoning in the image plane, our method, called BEV, adds an additional imaginary Bird's-Eye-View representation to explicitly reason about depth. BEV reasons simultaneously about body centers in the image and in depth and, by combing these, estimates 3D body position. Unlike prior work, BEV is a single-shot method that is end-to-end differentiable. Second, height varies with age, making it impossible to resolve depth without also estimating the age of people in the image. To do so, we exploit a 3D body model space that lets BEV infer shapes from infants to adults. Third, to train BEV, we need a new dataset. Specifically, we create a "Relative Human" (RH) dataset that includes age labels and relative depth relationships between the people in the images. Extensive experiments on RH and AGORA demonstrate the effectiveness of the model and training scheme. BEV outperforms existing methods on depth reasoning, child shape estimation, and robustness to occlusion. The code1 and dataset2 are released for research purposes.
引用
收藏
页码:13233 / 13242
页数:10
相关论文
共 50 条
  • [21] USING MONOCULAR DEPTH CUES FOR MODELING STEREOSCOPIC 3D SALIENCY
    Iatsun, Iana
    Larabi, Mohamed-Chaker
    Fernandez-Maloigne, Christine
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [22] Depth-enhancement network for monocular 3D object detection
    Liu, Guohua
    Lian, Haiyang
    Guo, Changrui
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (09)
  • [23] Deep Optics for Monocular Depth Estimation and 3D Object Detection
    Chang, Julie
    Wetzstein, Gordon
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10192 - 10201
  • [24] DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection
    Kumar, Abhinav
    Brazil, Garrick
    Corona, Enrique
    Parchami, Armin
    Liu, Xiaoming
    COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 664 - 683
  • [25] Categorical Depth Distribution Network for Monocular 3D Object Detection
    Reading, Cody
    Harakeh, Ali
    Chae, Julia
    Waslander, Steven L.
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8551 - 8560
  • [26] Boosting Monocular Depth Estimation with Lightweight 3D Point Fusion
    Huynh, Lam
    Phong Nguyen
    Matas, Jiri
    Rahtu, Esa
    Heikkila, Janne
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12747 - 12756
  • [27] 3D Packing for Self-Supervised Monocular Depth Estimation
    Guizilini, Vitor
    Ambrus, Rares
    Pillai, Sudeep
    Raventos, Allan
    Gaidon, Adrien
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2482 - 2491
  • [28] PUTTING PEOPLE BACK INTO PLACE-BASED PUBLIC POLICIES
    Khare, Amy T.
    JOURNAL OF URBAN AFFAIRS, 2015, 37 (01) : 47 - 52
  • [29] PDR: Progressive Depth Regularization for Monocular 3D Object Detection
    Sheng, Hualian
    Cai, Sijia
    Zhao, Na
    Deng, Bing
    Zhao, Min-Jian
    Lee, Gim Hee
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7591 - 7603
  • [30] Densely Constrained Depth Estimator for Monocular 3D Object Detection
    Li, Yingyan
    Chen, Yuntao
    He, Jiawei
    Zhang, Zhaoxiang
    COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 718 - 734