Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior

被引:1
|
作者
Wu, Yiqian [1 ]
Xu, Hao [1 ]
Tang, Xiangjun [1 ]
Chen, Xien [2 ]
Tang, Siyu [3 ]
Zhang, Zhebin [4 ]
Li, Chen [4 ]
Jin, Xiaogang [1 ]
机构
[1] Zhejiang Univ, State Key Lab CAD&CG, Hangzhou, Peoples R China
[2] Yale Univ, New Haven, CT USA
[3] Swiss Fed Inst Technol, Zurich, Switzerland
[4] OPPO US Res Ctr, Menlo Pk, CA USA
来源
ACM TRANSACTIONS ON GRAPHICS | 2024年 / 43卷 / 04期
基金
中国国家自然科学基金;
关键词
3D portrait generation; 3D-aware GANs; diffusion models;
D O I
10.1145/3658162
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Existing neural rendering-based text-to-3D-portrait generation methods typically make use of human geometry prior and diffusion models to obtain guidance. However, relying solely on geometry information introduces issues such as the Janus problem, over-saturation, and over-smoothing. We present Portrait3D, a novel neural rendering-based framework with a novel joint geometry-appearance prior to achieve text-to-3D-portrait generation that overcomes the aforementioned issues. To accomplish this, we train a 3D portrait generator, 3DPortraitGAN(sic), as a robust prior. This generator is capable of producing 360 degrees. canonical 3D portraits, serving as a starting point for the subsequent diffusion-based generation process. To mitigate the "grid-like" artifact caused by the high-frequency information in the featuremap-based 3D representation commonly used by most 3D-aware GANs, we integrate a novel pyramid tri-grid 3D representation into 3DPortraitGAN(sic). To generate 3D portraits from text, we first project a randomly generated image aligned with the given prompt into the pre-trained 3DPortraitGAN(sic) 's latent space. The resulting latent code is then used to synthesize a pyramid tri-grid. Beginning with the obtained pyramid tri-grid, we use score distillation sampling to distill the diffusion model's knowledge into the pyramid tri-grid. Following that, we utilize the diffusion model to refine the rendered images of the 3D portrait and then use these refined images as training data to further optimize the pyramid tri-grid, effectively eliminating issues with unrealistic color and unnatural artifacts. Our experimental results show that Portrait3D can produce realistic, high-quality, and canonical 3D portraits that align with the prompt.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] High-quality Modelling of 3D Point Clouds
    Gorkovchuk, Denys
    GIM INTERNATIONAL-THE WORLDWIDE MAGAZINE FOR GEOMATICS, 2016, 30 (09): : 44 - 45
  • [32] Energy-aware Depth Map Generation for 3D Portrait on Android Systems
    Kao, Chia-Hui
    King, Chung-Ta
    Tseng, Shau-Yin
    2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 959 - 964
  • [33] An Intuitive System for 3D Avatar with High-quality
    Lee, JiHyung
    Choi, Yoon-Seok
    Koo, Bon-Ki
    Hwang, Chi Jung
    2010 DIGEST OF TECHNICAL PAPERS INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS ICCE, 2010,
  • [34] Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior
    Chen, Cheng
    Yan, Xiaofeng
    Yang, Fan
    Feng, Chengzeng
    Fu, Zhoujie
    Foo, Chuan-Sheng
    Lin, Guosheng
    Liu, Fayao
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 10228 - 10237
  • [35] DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior
    Huang, Tianyu
    Zeng, Yihan
    Zhang, Zhilu
    Xu, Wan
    Xu, Hang
    Xu, Songcen
    Lau, Rynson W.H.
    Zuo, Wangmeng
    arXiv, 2023,
  • [36] HQ-Avatar: Towards High-Quality 3D Avatar Generation via Point-based Representation
    Zhang, Weitian
    Wu, Sijing
    Yan, Yichao
    Xue, Ben
    Zhu, Wenhan
    Yang, Xiaokang
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME 2024, 2024,
  • [37] DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation
    Ju, Xiaoliang
    Huang, Zhaoyang
    Li, Yijin
    Zhang, Guofeng
    Qiao, Yu
    Li, Hongsheng
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 4526 - 4535
  • [38] Joint2Human: High-quality 3D Human Generation via Compact Spherical Embedding of 3D Joints
    Zhang, Muxin
    Feng, Qiao
    Su, Zhuo
    Wen, Chao
    Xue, Zhou
    Li, Kun
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 1429 - 1438
  • [39] MinD-3D: Reconstruct High-Quality 3D Objects in Human Brain
    Gao, Jianxiong
    Fu, Yuqian
    Wang, Yun
    Qian, Xuelin
    Feng, Jianfeng
    Fu, Yanwei
    COMPUTER VISION - ECCV 2024, PT XLVII, 2025, 15105 : 312 - 329
  • [40] High-quality 3D shape measurement using saturated fringe patterns
    Chen, Bo
    Zhang, Song
    OPTICS AND LASERS IN ENGINEERING, 2016, 87 : 83 - 89