3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models

被引:5
|
作者
Yang, Haibo [1 ]
Chen, Yang [2 ]
Pan, Yingwei [2 ]
Yao, Ting [3 ]
Chen, Zhineng [1 ]
Mei, Tao [3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Univ Sci & Technol China, Hefei, Peoples R China
[3] HiDream Ai Inc, Beijing, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Text-driven 3D Stylization; Diffusion Model; Depth;
D O I
10.1145/3581783.3612363
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D content creation via text-driven stylization has played a fundamental challenge to multimedia and graphics community. Recent advances of cross-modal foundation models (e.g., CLIP) have made this problem feasible. Those approaches commonly leverage CLIP to align the holistic semantics of stylized mesh with the given text prompt. Nevertheless, it is not trivial to enable more controllable stylization of fine-grained details in 3D meshes solely based on such semantic-level cross-modal supervision. In this work, we propose a new 3DStyle-Diffusion model that triggers fine-grained stylization of 3D meshes with additional controllable appearance and geometric guidance from 2D Diffusion models. Technically, 3DStyle-Diffusion first parameterizes the texture of 3D mesh into reflectance properties and scene lighting using implicit MLP networks. Meanwhile, an accurate depth map of each sampled view is achieved conditioned on 3D mesh. Then, 3DStyle-Diffusion leverages a pretrained controllable 2D Diffusion model to guide the learning of rendered images, encouraging the synthesized image of each view semantically aligned with text prompt and geometrically consistent with depth map. This way elegantly integrates both image rendering via implicit MLP networks and diffusion process of image synthesis in an end-to-end fashion, enabling a high-quality fine-grained stylization of 3D meshes. We also build a new dataset derived from Objaverse and the evaluation protocol for this task. Through both qualitative and quantitative experiments, we validate the capability of our 3DStyle-Diffusion. Source code and data are available at https://github.com/yanghb22- fdu/3DStyle- Diffusion-Official.
引用
收藏
页码:6860 / 6868
页数:9
相关论文
共 50 条
  • [1] ControlNeRF: Text-Driven 3D Scene Stylization via Diffusion Model
    Chen, Jiahui
    Yang, Chuanfeng
    Li, Kaiheng
    Wu, Qingqiang
    Hong, Qingqi
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II, 2024, 15017 : 395 - 406
  • [2] Enhanced Fine-Grained Motion Diffusion for Text-Driven Human Motion Synthesis
    Wei, Dong
    Sun, Xiaoning
    Sun, Huaijiang
    Hu, Shengxiang
    Li, Bin
    Li, Weiqing
    Lu, Jianfeng
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5876 - 5884
  • [3] FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models
    Xu, Jinglin
    Guo, Yijie
    Peng, Yuxin
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 561 - 570
  • [4] TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition
    Chen, Yongwei
    Chen, Rui
    Lei, Jiabao
    Zhang, Yabin
    Jia, Kui
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [5] LEVERAGING 2D AND 3D CUES FOR FINE-GRAINED OBJECT CLASSIFICATION
    Wang, Xiaolong
    Li, Robert
    Currey, Jon
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 1354 - 1358
  • [6] GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
    Yi, Taoran
    Fang, Jiemin
    Wang, Junjie
    Wu, Guanjun
    Xie, Lingxi
    Zhang, Xiaopeng
    Liu, Wenyu
    Tian, Qi
    Wang, Xinggang
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6796 - 6807
  • [7] Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model
    Wang, Yin
    Leng, Zhiying
    Li, Frederick W. B.
    Wu, Shun-Cheng
    Liang, Xiaohui
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21978 - 21987
  • [8] TextANIMAR: Text-based 3D animal fine-grained retrieval
    Le, Trung-Nghia
    Nguyen, Tam, V
    Le, Minh-Quan
    Nguyen, Trong-Thuan
    Huynh, Viet-Tham
    Do, Trong-Le
    Le, Khanh-Duy
    Tran, Mai-Khiem
    Hoang-Xuan, Nhat
    Nguyen-Ho, Thang-Long
    Nguyen, Vinh-Tiep
    Diep, Tuong-Nghiem
    Ho, Khanh-Duy
    Nguyen, Xuan-Hieu
    Tran, Thien-Phuc
    Yang, Tuan-Anh
    Tran, Kim-Phat
    Hoang, Nhu-Vinh
    Nguyen, Minh-Quang
    Nguyen, E-Ro
    Nguyen-Nhat, Minh-Khoi
    To, Tuan-An
    Huynh-Le, Trung-Truc
    Nguyen, Nham-Tan
    Luong, Hoang-Chau
    Phong, Truong Hoai
    Le-Pham, Nhat-Quynh
    Pham, Huu-Phuc
    Hoang, Trong-Vu
    Nguyen, Quang-Binh
    Nguyen, Hai-Dang
    Sugimoto, Akihiro
    Tran, Minh-Triet
    COMPUTERS & GRAPHICS-UK, 2023, 116 : 162 - 172
  • [9] Talk-to-Edit: Fine-Grained 2D and 3D Facial Editing via Dialog
    Jiang, Yuming
    Huang, Ziqi
    Wu, Tianxing
    Pan, Xingang
    Loy, Chen Change
    Liu, Ziwei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3692 - 3706
  • [10] Diverse and Stable 2D Diffusion Guided Text to 3D Generation with Noise Recalibration
    Yang, Xiaofeng
    Liu, Fayao
    Xu, Yi
    Su, Hanjing
    Wu, Qingyao
    Lin, Guosheng
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6549 - 6557