AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation

被引:0
|
作者
Wang, Xinzhou [1 ,2 ,3 ,4 ]
Wang, Yikai [2 ]
Yee, Junliang [2 ]
Sung, Fuchun [2 ]
Wang, Zhengyi [2 ,3 ]
Wang, Ling [2 ,6 ]
Liu, Pengkun [2 ,7 ]
Sung, Kai [2 ]
Wan, Xintong [8 ]
Xie, Wende [5 ]
Liu, Fangfu [2 ]
He, Bin [1 ]
机构
[1] Tongji Univ, Shanghai, Peoples R China
[2] Tsinghua Univ, Beijing, Peoples R China
[3] ShengShu, Beijing, Peoples R China
[4] Tencent, Shenzhen, Peoples R China
[5] Didi, Beijing, Peoples R China
[6] Xian Res Inst High Tech, Xian, Peoples R China
[7] Fudan Univ, Shanghai, Peoples R China
[8] Zhejiang Univ, Hangzhou, Peoples R China
来源
基金
中国国家自然科学基金; 中国博士后科学基金; 美国国家科学基金会;
关键词
4D generation; Diffusion model; Non-rigid reconstruction;
D O I
10.1007/978-3-031-72698-9_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Advances in 3D generation have facilitated sequential 3D model generation (a.k.a 4D generation), yet its application for animatable objects with large motion remains scarce. Our work proposes AnimatableDreamer, a text-to-4D generation framework capable of generating diverse categories of non-rigid objects on skeletons extracted from a monocular video. At its core, AnimatableDreamer is equipped with our novel optimization design dubbed Canonical Score Distillation (CSD), which lifts 2D diffusion for temporal consistent 4D generation. CSD, designed from a score gradient perspective, generates a canonical model with warp-robustness across different articulations. Notably, it also enhances the authenticity of bones and skinning by integrating inductive priors from a diffusion model. Furthermore, with multi-view distillation, CSD infers invisible regions, thereby improving the fidelity of monocular non-rigid reconstruction. Extensive experiments demonstrate the capability of our method in generating high-flexibility text-guided 3D models from the monocular video, while also showing improved reconstruction performance over existing non-rigid reconstruction methods. Project page https://zz7379.github.io/AnimatableDreamer/.
引用
收藏
页码:321 / 339
页数:19
相关论文
共 50 条
  • [41] DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors
    Lei, Biwen
    Yu, Kai
    Feng, Mengyang
    Cui, Miaomiao
    Xie, Xuansong
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 10487 - 10497
  • [42] Rigid and non-rigid 3D shape classification based on 3D Hahn moments neural networks model
    Zouhir Lakhili
    Abdelmajid El Alami
    Abderrahim Mesbah
    Aissam Berrahou
    Hassan Qjidaa
    Multimedia Tools and Applications, 2022, 81 : 38067 - 38090
  • [43] Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation
    Li, Zongrui
    Hu, Minghui
    Zheng, Qian
    Jiang, Xudong
    COMPUTER VISION-ECCV 2024, PT XLIII, 2025, 15101 : 274 - 291
  • [44] Euclidean-distance-based canonical forms for non-rigid 3D shape retrieval
    Pickup, David
    Sun, Xianfang
    Rosin, Paul L.
    Martin, Ralph R.
    PATTERN RECOGNITION, 2015, 48 (08) : 2500 - 2512
  • [45] Hybrid shape descriptor and meta similarity generation for non-rigid and partial 3D model retrieval
    Bo Li
    Afzal Godil
    Henry Johan
    Multimedia Tools and Applications, 2014, 72 : 1531 - 1560
  • [46] HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation
    Liu, Hongyu
    Wang, Xuang
    Wan, Ziyu
    Shen, Yujun
    Song, Yibing
    Liao, Jing
    Chen, Qifeng
    PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
  • [47] Hybrid shape descriptor and meta similarity generation for non-rigid and partial 3D model retrieval
    Li, Bo
    Godil, Afzal
    Johan, Henry
    MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 72 (02) : 1531 - 1560
  • [48] CLIP-Head: Text-Guided Generation of Textured Neural Parametric 3D Head Models
    Manu, Pranav
    Srivastava, Astitva
    Sharma, Avinash
    PROCEEDINGS SIGGRAPH ASIA 2023 TECHNICAL COMMUNICATIONS, SA TECHNICAL COMMUNICATIONS 2023, 2023,
  • [49] ClipFace: Text-guided Editing of Textured 3D Morphable Models
    Aneja, Shivangi
    Thies, Justus
    Dai, Angela
    Niessner, Matthias
    PROCEEDINGS OF SIGGRAPH 2023 CONFERENCE PAPERS, SIGGRAPH 2023, 2023,
  • [50] Image Collection Pop-up: 3D Reconstruction and Clustering of Rigid and Non-Rigid Categories
    Agudo, Antonio
    Pijoan, Melcior
    Moreno-Noguer, Francesc
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2607 - 2615