3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models

被引:5
|
作者
Yang, Haibo [1 ]
Chen, Yang [2 ]
Pan, Yingwei [2 ]
Yao, Ting [3 ]
Chen, Zhineng [1 ]
Mei, Tao [3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Univ Sci & Technol China, Hefei, Peoples R China
[3] HiDream Ai Inc, Beijing, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Text-driven 3D Stylization; Diffusion Model; Depth;
D O I
10.1145/3581783.3612363
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D content creation via text-driven stylization has played a fundamental challenge to multimedia and graphics community. Recent advances of cross-modal foundation models (e.g., CLIP) have made this problem feasible. Those approaches commonly leverage CLIP to align the holistic semantics of stylized mesh with the given text prompt. Nevertheless, it is not trivial to enable more controllable stylization of fine-grained details in 3D meshes solely based on such semantic-level cross-modal supervision. In this work, we propose a new 3DStyle-Diffusion model that triggers fine-grained stylization of 3D meshes with additional controllable appearance and geometric guidance from 2D Diffusion models. Technically, 3DStyle-Diffusion first parameterizes the texture of 3D mesh into reflectance properties and scene lighting using implicit MLP networks. Meanwhile, an accurate depth map of each sampled view is achieved conditioned on 3D mesh. Then, 3DStyle-Diffusion leverages a pretrained controllable 2D Diffusion model to guide the learning of rendered images, encouraging the synthesized image of each view semantically aligned with text prompt and geometrically consistent with depth map. This way elegantly integrates both image rendering via implicit MLP networks and diffusion process of image synthesis in an end-to-end fashion, enabling a high-quality fine-grained stylization of 3D meshes. We also build a new dataset derived from Objaverse and the evaluation protocol for this task. Through both qualitative and quantitative experiments, we validate the capability of our 3DStyle-Diffusion. Source code and data are available at https://github.com/yanghb22- fdu/3DStyle- Diffusion-Official.
引用
收藏
页码:6860 / 6868
页数:9
相关论文
共 50 条
  • [31] Fine-Grained 3D Reconfigurable Computing Fabric with RRAM
    Li, Mingyu
    Shi, Jiajun
    Bhat, Sachin
    Moritz, Csaba Andras
    PROCEEDINGS OF THE IEEE/ACM INTERNATIONAL SYMPOSIUM ON NANOSCALE ARCHITECTURES (NANOARCH 2017), 2017, : 79 - 80
  • [32] 3D Pose Estimation for Fine-Grained Object Categories
    Wang, Yaming
    Tan, Xiao
    Yang, Yi
    Liu, Xiao
    Ding, Errui
    Zhou, Feng
    Davis, Larry S.
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT I, 2019, 11129 : 619 - 632
  • [33] Text2NeRF: Text-Driven 3D Scene Generation With Neural Radiance Fields
    Zhang, Jingbo
    Li, Xiaoyu
    Wan, Ziyu
    Wang, Can
    Liao, Jing
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (12) : 7749 - 7762
  • [34] SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes
    Delitzas, Alexandros
    Takmaz, Ayca
    Tombari, Federico
    Sumner, Robert
    Pollefeys, Marc
    Engelmann, Francis
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 14531 - 14542
  • [35] Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion
    Jakab, Tomas
    Li, Ruining
    Wu, Shangzhe
    Rupprecht, Christian
    Vedaldi, Andrea
    2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 852 - 861
  • [36] Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models
    Chung, Hyungjin
    Ryu, Dohoon
    Mccann, Michael T.
    Klasky, Marc L.
    Ye, Jong Chul
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22542 - 22551
  • [37] An Emotional Text-Driven 3D Visual Pronunciation System for Mandarin Chinese
    Yu, Lingyun
    Luo, Changwei
    Yu, Jun
    PATTERN RECOGNITION (CCPR 2016), PT I, 2016, 662 : 93 - 104
  • [38] InterFusion: Text-Driven Generation of 3D Human-Object Interaction
    Dai, Sisi
    Li, Wenhao
    Sun, Haowen
    Huang, Haibin
    Ma, Chongyang
    Huang, Hui
    Xu, Kai
    Hu, Ruizhen
    COMPUTER VISION - ECCV 2024, PT XLVIII, 2025, 15106 : 18 - 35
  • [39] AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars
    Mendiratta, Mohit
    Pan, Xingang
    Elgharib, Mohamed
    Teotia, Kartik
    Mallikarjun, B. R.
    Tewari, Ayush
    Golyanik, Vladislav
    Kortylewski, Adam
    Theobalt, Christian
    ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (06):
  • [40] Diffusion models for 3D generation: A survey
    Wang, Chen
    Peng, Hao-Yang
    Liu, Ying-Tian
    Gu, Jiatao
    Hu, Shi-Min
    COMPUTATIONAL VISUAL MEDIA, 2025, 11 (01): : 1 - 28