Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion

被引:0
|
作者
Jakab, Tomas [1 ]
Li, Ruining [1 ]
Wu, Shangzhe [1 ]
Rupprecht, Christian [1 ]
Vedaldi, Andrea [1 ]
机构
[1] Univ Oxford, Visual Geometry Grp, Oxford, England
基金
欧洲研究理事会;
关键词
D O I
10.1109/3DV62453.2024.00051
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present Farm3D, a method for learning category-specific 3D reconstructors for articulated objects, relying solely on "free" virtual supervision from a pre-trained 2D diffusion-based image generator. Recent approaches can learn a monocular network that predicts the 3D shape, albedo, illumination, and viewpoint of any object occurrence, given a collection of single-view images of an object category. However, these approaches heavily rely on manually curated clean training data, which are expensive to obtain. We propose a framework that uses an image generator, such as Stable Diffusion, to generate synthetic training data that are sufficiently clean and do not require further manual curation, enabling the learning of such a reconstruction network from scratch. Additionally, we incorporate the diffusion model as a score to enhance the learning process. The idea involves randomizing certain aspects of the reconstruction, such as viewpoint and illumination, generating virtual views of the reconstructed 3D object, and allowing the 2D network to assess the quality of the resulting image, thus providing feedback to the reconstructor. Unlike work based on distillation, which produces a single 3D asset for each textual prompt, our approach yields a monocular reconstruction network capable of outputting a controllable 3D asset from any given image, whether real or generated, in a single forward pass in a matter of seconds. Our network can be used for analysis, including monocular reconstruction, or for synthesis, generating articulated assets for real-time applications such as video games. The code can be found on the project page at https://farm3d.github.io/.
引用
收藏
页码:852 / 861
页数:10
相关论文
共 50 条
  • [1] MagicPony: Learning Articulated 3D Animals in the Wild
    Wu, Shangzhe
    Li, Ruining
    Jakab, Tomas
    Rupprecht, Christian
    Vedaldi, Andrea
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 8792 - 8802
  • [2] Learning 3D Deformation of Animals from 2D Images
    Kanazawa, Angjoo
    Kovalsky, Shahar
    Basri, Ronen
    Jacobs, David
    [J]. COMPUTER GRAPHICS FORUM, 2016, 35 (02) : 365 - 374
  • [3] 3D articulated object understanding, learning, and recognition from 2D images
    Wang, PSP
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2000, 14 (07) : 863 - 873
  • [4] Matching 2D and 3D articulated shapes using the eccentricity transform
    Ion, Adrian
    Artner, Nicole M.
    Peyre, Gabriel
    Kropatsch, Walter G.
    Cohen, Laurent D.
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2011, 115 (06) : 817 - 834
  • [5] Fusion of 2d and 3d sensor data for articulated body tracking
    Knoop, Steffen
    Vacek, Stefan
    Dillmann, Ruediger
    [J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2009, 57 (03) : 321 - 329
  • [6] 2D or 3D?
    Mills, R
    [J]. COMPUTER-AIDED ENGINEERING, 1996, 15 (08): : 4 - 4
  • [7] SPIN DIFFUSION IN 2D AND 3D QUANTUM SOLIDS
    COWAN, B
    MULLIN, WJ
    NELSON, E
    [J]. JOURNAL OF LOW TEMPERATURE PHYSICS, 1989, 77 (3-4) : 181 - 193
  • [8] Structural diffusion in 2D and 3D random flows
    Malik, NA
    [J]. ADVANCES IN TURBULENCES VI, 1996, 36 : 619 - 620
  • [9] 3D and 2D/3D holograms model
    A. A. Boriskevich
    V. K. Erohovets
    V. V. Tkachenko
    [J]. Optical Memory and Neural Networks, 2012, 21 (4) : 242 - 248
  • [10] Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles
    Srivastava, Siddharth
    Jurie, Frederic
    Sharma, Gaurav
    [J]. 2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 4504 - 4511