AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation

被引:0
|
作者
Wang, Xinzhou [1 ,2 ,3 ,4 ]
Wang, Yikai [2 ]
Yee, Junliang [2 ]
Sung, Fuchun [2 ]
Wang, Zhengyi [2 ,3 ]
Wang, Ling [2 ,6 ]
Liu, Pengkun [2 ,7 ]
Sung, Kai [2 ]
Wan, Xintong [8 ]
Xie, Wende [5 ]
Liu, Fangfu [2 ]
He, Bin [1 ]
机构
[1] Tongji Univ, Shanghai, Peoples R China
[2] Tsinghua Univ, Beijing, Peoples R China
[3] ShengShu, Beijing, Peoples R China
[4] Tencent, Shenzhen, Peoples R China
[5] Didi, Beijing, Peoples R China
[6] Xian Res Inst High Tech, Xian, Peoples R China
[7] Fudan Univ, Shanghai, Peoples R China
[8] Zhejiang Univ, Hangzhou, Peoples R China
来源
基金
中国国家自然科学基金; 中国博士后科学基金; 美国国家科学基金会;
关键词
4D generation; Diffusion model; Non-rigid reconstruction;
D O I
10.1007/978-3-031-72698-9_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Advances in 3D generation have facilitated sequential 3D model generation (a.k.a 4D generation), yet its application for animatable objects with large motion remains scarce. Our work proposes AnimatableDreamer, a text-to-4D generation framework capable of generating diverse categories of non-rigid objects on skeletons extracted from a monocular video. At its core, AnimatableDreamer is equipped with our novel optimization design dubbed Canonical Score Distillation (CSD), which lifts 2D diffusion for temporal consistent 4D generation. CSD, designed from a score gradient perspective, generates a canonical model with warp-robustness across different articulations. Notably, it also enhances the authenticity of bones and skinning by integrating inductive priors from a diffusion model. Furthermore, with multi-view distillation, CSD infers invisible regions, thereby improving the fidelity of monocular non-rigid reconstruction. Extensive experiments demonstrate the capability of our method in generating high-flexibility text-guided 3D models from the monocular video, while also showing improved reconstruction performance over existing non-rigid reconstruction methods. Project page https://zz7379.github.io/AnimatableDreamer/.
引用
收藏
页码:321 / 339
页数:19
相关论文
共 50 条
  • [1] A Survey of Text-guided 3D Face Reconstruction
    Cen, Mengyue
    Shen, Haoran
    Zhao, Wangyan
    Pan, Dingcheng
    Feng, Xiaoyi
    2024 3RD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MEDIA COMPUTING, ICIPMC 2024, 2024, : 82 - 87
  • [2] Towards Implicit Text-Guided 3D Shape Generation
    Liu, Zhengzhe
    Wang, Yi
    Qi, Xiaojuan
    Fu, Chi-Wing
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17875 - 17885
  • [3] WordRobe: Text-Guided Generation of Textured 3D Garments
    Srivastava, Astitva
    Manu, Pranav
    Raj, Amit
    Jampani, Varun
    Sharma, Avinash
    COMPUTER VISION-ECCV 2024, PT I, 2025, 15059 : 458 - 475
  • [4] Accurate reconstruction of non-rigid 3D shapes
    Koh, Sung Shik
    Zin, Thi Thi
    Hama, Hiromitsu
    ICCE: 2007 DIGEST OF TECHNICAL PAPERS INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, 2007, : 369 - +
  • [5] TECA: Text-Guided Generation and Editing of Compositional 3D Avatars
    Zhang, Hao
    Feng, Yao
    Kulits, Peter
    Wen, Yandong
    Thies, Justus
    Black, Michael J.
    2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 1520 - 1530
  • [6] DREAMCRAFT: Text-Guided Generation of Functional 3D Environments in Minecraft
    Earle, Sam
    Kokkinos, Filippos
    Nie, Yuhe
    Togelius, Julian
    Raileanu, Roberta
    PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF DIGITAL GAMES, FDG 2024, 2024,
  • [7] Unsupervised 3D Reconstruction and Grouping of Rigid and Non-Rigid Categories
    Agudo, Antonio
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (01) : 519 - 532
  • [8] An evaluation of canonical forms for non-rigid 3D shape retrieval
    Pickup, David
    Liu, Juncheng
    Sun, Xianfang
    Rosin, Paul L.
    Martin, Ralph R.
    Cheng, Zhiquan
    Lian, Zhouhui
    Nie, Sipin
    Jin, Longcun
    Shamai, Gil
    Sahillioglu, Yusuf
    Kavan, Ladislav
    GRAPHICAL MODELS, 2018, 97 : 17 - 29
  • [9] Text-guided 3D Human Generation from 2D Collections
    Fu, Tsu-Jui
    Xiong, Wenhan
    Nie, Yixin
    Liu, Jingyu
    Oguz, Barlas
    Wang, William Yang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4508 - 4520
  • [10] KillingFusion: Non-rigid 3D Reconstruction without Correspondences
    Slavcheva, Miroslava
    Baust, Maximilian
    Cremers, Daniel
    Ilic, Slobodan
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5474 - 5483