Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion

被引：0

作者：

Jakab, Tomas ^{[1
]}

Li, Ruining ^{[1
]}

Wu, Shangzhe ^{[1
]}

Rupprecht, Christian ^{[1
]}

Vedaldi, Andrea ^{[1
]}

机构：

[1] Univ Oxford, Visual Geometry Grp, Oxford, England

来源：

2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024 | 2024年

基金：

欧洲研究理事会;

关键词：

D O I：

10.1109/3DV62453.2024.00051

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present Farm3D, a method for learning category-specific 3D reconstructors for articulated objects, relying solely on "free" virtual supervision from a pre-trained 2D diffusion-based image generator. Recent approaches can learn a monocular network that predicts the 3D shape, albedo, illumination, and viewpoint of any object occurrence, given a collection of single-view images of an object category. However, these approaches heavily rely on manually curated clean training data, which are expensive to obtain. We propose a framework that uses an image generator, such as Stable Diffusion, to generate synthetic training data that are sufficiently clean and do not require further manual curation, enabling the learning of such a reconstruction network from scratch. Additionally, we incorporate the diffusion model as a score to enhance the learning process. The idea involves randomizing certain aspects of the reconstruction, such as viewpoint and illumination, generating virtual views of the reconstructed 3D object, and allowing the 2D network to assess the quality of the resulting image, thus providing feedback to the reconstructor. Unlike work based on distillation, which produces a single 3D asset for each textual prompt, our approach yields a monocular reconstruction network capable of outputting a controllable 3D asset from any given image, whether real or generated, in a single forward pass in a matter of seconds. Our network can be used for analysis, including monocular reconstruction, or for synthesis, generating articulated assets for real-time applications such as video games. The code can be found on the project page at https://farm3d.github.io/.

引用

页码：852 / 861

页数：10

共 50 条

[1] MagicPony: Learning Articulated 3D Animals in the Wild
Wu, Shangzhe
Li, Ruining
Jakab, Tomas
Rupprecht, Christian
Vedaldi, Andrea
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 8792 - 8802
[2] Learning 3D Deformation of Animals from 2D Images
Kanazawa, Angjoo
Kovalsky, Shahar
Basri, Ronen
Jacobs, David
[J]. COMPUTER GRAPHICS FORUM, 2016, 35 (02) : 365 - 374
[3] 3D articulated object understanding, learning, and recognition from 2D images
Wang, PSP
[J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2000, 14 (07) : 863 - 873
[4] Matching 2D and 3D articulated shapes using the eccentricity transform
Ion, Adrian
Artner, Nicole M.
Peyre, Gabriel
Kropatsch, Walter G.
Cohen, Laurent D.
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2011, 115 (06) : 817 - 834
[5] Fusion of 2d and 3d sensor data for articulated body tracking
Knoop, Steffen
Vacek, Stefan
Dillmann, Ruediger
[J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2009, 57 (03) : 321 - 329
[6] 2D or 3D?
Mills, R
[J]. COMPUTER-AIDED ENGINEERING, 1996, 15 (08): : 4 - 4
[7] SPIN DIFFUSION IN 2D AND 3D QUANTUM SOLIDS
COWAN, B
MULLIN, WJ
NELSON, E
[J]. JOURNAL OF LOW TEMPERATURE PHYSICS, 1989, 77 (3-4) : 181 - 193
[8] Structural diffusion in 2D and 3D random flows
Malik, NA
[J]. ADVANCES IN TURBULENCES VI, 1996, 36 : 619 - 620
[9] 3D and 2D/3D holograms model
A. A. Boriskevich
V. K. Erohovets
V. V. Tkachenko
[J]. Optical Memory and Neural Networks, 2012, 21 (4) : 242 - 248
[10] Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles
Srivastava, Siddharth
Jurie, Frederic
Sharma, Gaurav
[J]. 2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 4504 - 4511

← 1 2 3 4 5 →