MannequinChallenge: Learning the Depths of Moving People by Watching Frozen People

被引:13
|
作者
Li, Zhengqi [1 ]
Dekel, Tali [2 ]
Cole, Forrester [2 ]
Tucker, Richard [2 ]
Snavely, Noah [1 ,2 ]
Liu, Ce [2 ]
Freeman, William T. [2 ]
机构
[1] Cornell Univ, Cornell Tech, Dept Comp Sci, Ithaca, NY 14850 USA
[2] Google Res, Mountain View, CA 94043 USA
关键词
Cameras; Three-dimensional displays; Cleaning; Internet; Image reconstruction; Geometry; Training; Depth prediction; mannequin challenge; dynamic scene reconstruction;
D O I
10.1109/TPAMI.2020.2974454
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method for predicting dense depth in scenarios where both a monocular camera and people in the scene are freely moving (right). Existing methods for recovering depth for dynamic, non-rigid objects from monocular video impose strong assumptions on the objects' motion and may only recover sparse depth. In this paper, we take a data-driven approach and learn human depth priors from a new source of data: thousands of Internet videos of people imitating mannequins, i.e., freezing in diverse, natural poses, while a hand-held camera tours the scene (left). Because people are stationary, geometric constraints hold, thus training data can be generated using multi-view stereo reconstruction. At inference time, our method uses motion parallax cues from the static areas of the scenes to guide the depth prediction. We evaluate our method on real-world sequences of complex human actions captured by a moving hand-held camera, show improvement over state-of-the-art monocular depth prediction methods, and demonstrate various 3D effects produced using our predicted depth.
引用
收藏
页码:4229 / 4241
页数:13
相关论文
共 50 条