Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior

被引:1
|
作者
Wu, Yiqian [1 ]
Xu, Hao [1 ]
Tang, Xiangjun [1 ]
Chen, Xien [2 ]
Tang, Siyu [3 ]
Zhang, Zhebin [4 ]
Li, Chen [4 ]
Jin, Xiaogang [1 ]
机构
[1] Zhejiang Univ, State Key Lab CAD&CG, Hangzhou, Peoples R China
[2] Yale Univ, New Haven, CT USA
[3] Swiss Fed Inst Technol, Zurich, Switzerland
[4] OPPO US Res Ctr, Menlo Pk, CA USA
来源
ACM TRANSACTIONS ON GRAPHICS | 2024年 / 43卷 / 04期
基金
中国国家自然科学基金;
关键词
3D portrait generation; 3D-aware GANs; diffusion models;
D O I
10.1145/3658162
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Existing neural rendering-based text-to-3D-portrait generation methods typically make use of human geometry prior and diffusion models to obtain guidance. However, relying solely on geometry information introduces issues such as the Janus problem, over-saturation, and over-smoothing. We present Portrait3D, a novel neural rendering-based framework with a novel joint geometry-appearance prior to achieve text-to-3D-portrait generation that overcomes the aforementioned issues. To accomplish this, we train a 3D portrait generator, 3DPortraitGAN(sic), as a robust prior. This generator is capable of producing 360 degrees. canonical 3D portraits, serving as a starting point for the subsequent diffusion-based generation process. To mitigate the "grid-like" artifact caused by the high-frequency information in the featuremap-based 3D representation commonly used by most 3D-aware GANs, we integrate a novel pyramid tri-grid 3D representation into 3DPortraitGAN(sic). To generate 3D portraits from text, we first project a randomly generated image aligned with the given prompt into the pre-trained 3DPortraitGAN(sic) 's latent space. The resulting latent code is then used to synthesize a pyramid tri-grid. Beginning with the obtained pyramid tri-grid, we use score distillation sampling to distill the diffusion model's knowledge into the pyramid tri-grid. Following that, we utilize the diffusion model to refine the rendered images of the 3D portrait and then use these refined images as training data to further optimize the pyramid tri-grid, effectively eliminating issues with unrealistic color and unnatural artifacts. Our experimental results show that Portrait3D can produce realistic, high-quality, and canonical 3D portraits that align with the prompt.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] CLIP-Head: Text-Guided Generation of Textured Neural Parametric 3D Head Models
    Manu, Pranav
    Srivastava, Astitva
    Sharma, Avinash
    PROCEEDINGS SIGGRAPH ASIA 2023 TECHNICAL COMMUNICATIONS, SA TECHNICAL COMMUNICATIONS 2023, 2023,
  • [22] TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models
    Cao, Tianshi
    Kreis, Karsten
    Fidler, Sanja
    Sharp, Nicholas
    Yin, Kangxue
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4146 - 4158
  • [23] Text-Guided Graph Neural Networks for Referring 3D Instance Segmentation
    Huang, Pin-Hao
    Lee, Han-Hung
    Chen, Hwann-Tzong
    Liu, Tyng-Luh
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1610 - 1618
  • [24] Vox-E: Text-guided Voxel Editing of 3D Objects
    Sella, Etai
    Fiebelman, Gal
    Hedman, Peter
    Averbuch-Elor, Hadar
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 430 - 440
  • [25] AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation
    Wang, Xinzhou
    Wang, Yikai
    Yee, Junliang
    Sung, Fuchun
    Wang, Zhengyi
    Wang, Ling
    Liu, Pengkun
    Sung, Kai
    Wan, Xintong
    Xie, Wende
    Liu, Fangfu
    He, Bin
    COMPUTER VISION - ECCV 2024, PT XXV, 2025, 15083 : 321 - 339
  • [26] AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections
    Wu, Yue
    Xu, Sicheng
    Xiang, Jianfeng
    Wei, Fangyun
    Chen, Qifeng
    Yang, Jiaolong
    Tong, Xin
    PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS, 2023,
  • [27] High-Quality Reconstruction of 3D Model
    Liu, Xing-ming
    Cai, Tie
    Gui, Rong-zhi
    Wang, Hui-jing
    Liu, Jun-yao
    COMPUTER SCIENCE AND TECHNOLOGY (CST2016), 2017, : 946 - 954
  • [28] Text and Image Guided 3D Avatar Generation and Manipulation
    Canfes, Zehranaz
    Atasoy, M. Furkan
    Dirik, Alara
    Yanardag, Pinar
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4410 - 4420
  • [29] High-quality image matching and automated generation of 3D tree models
    Baltsavias, E.
    Gruen, A.
    Eisenbeiss, H.
    Zhang, L.
    Waser, L. T.
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2008, 29 (05) : 1243 - 1259
  • [30] AvatarVerse: High-Quality & Stable 3D Avatar Creation from Text and Pose
    Zhang, Huichao
    Chen, Bowen
    Yang, Hao
    Qu, Liao
    Wang, Xu
    Chen, Li
    Long, Chao
    Zhu, Feida
    Du, Daniel
    Zheng, Min
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7124 - 7132