3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models

被引：5

作者：

Yang, Haibo ^{[1
]}

Chen, Yang ^{[2
]}

Pan, Yingwei ^{[2
]}

Yao, Ting ^{[3
]}

Chen, Zhineng ^{[1
]}

Mei, Tao ^{[3
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China

[2] Univ Sci & Technol China, Hefei, Peoples R China

[3] HiDream Ai Inc, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Text-driven 3D Stylization; Diffusion Model; Depth;

D O I：

10.1145/3581783.3612363

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

3D content creation via text-driven stylization has played a fundamental challenge to multimedia and graphics community. Recent advances of cross-modal foundation models (e.g., CLIP) have made this problem feasible. Those approaches commonly leverage CLIP to align the holistic semantics of stylized mesh with the given text prompt. Nevertheless, it is not trivial to enable more controllable stylization of fine-grained details in 3D meshes solely based on such semantic-level cross-modal supervision. In this work, we propose a new 3DStyle-Diffusion model that triggers fine-grained stylization of 3D meshes with additional controllable appearance and geometric guidance from 2D Diffusion models. Technically, 3DStyle-Diffusion first parameterizes the texture of 3D mesh into reflectance properties and scene lighting using implicit MLP networks. Meanwhile, an accurate depth map of each sampled view is achieved conditioned on 3D mesh. Then, 3DStyle-Diffusion leverages a pretrained controllable 2D Diffusion model to guide the learning of rendered images, encouraging the synthesized image of each view semantically aligned with text prompt and geometrically consistent with depth map. This way elegantly integrates both image rendering via implicit MLP networks and diffusion process of image synthesis in an end-to-end fashion, enabling a high-quality fine-grained stylization of 3D meshes. We also build a new dataset derived from Objaverse and the evaluation protocol for this task. Through both qualitative and quantitative experiments, we validate the capability of our 3DStyle-Diffusion. Source code and data are available at https://github.com/yanghb22- fdu/3DStyle- Diffusion-Official.

引用

页码：6860 / 6868

页数：9

共 50 条

[31] Fine-Grained 3D Reconfigurable Computing Fabric with RRAM
Li, Mingyu
Shi, Jiajun
Bhat, Sachin
Moritz, Csaba Andras
PROCEEDINGS OF THE IEEE/ACM INTERNATIONAL SYMPOSIUM ON NANOSCALE ARCHITECTURES (NANOARCH 2017), 2017, : 79 - 80
[32] 3D Pose Estimation for Fine-Grained Object Categories
Wang, Yaming
Tan, Xiao
Yang, Yi
Liu, Xiao
Ding, Errui
Zhou, Feng
Davis, Larry S.
COMPUTER VISION - ECCV 2018 WORKSHOPS, PT I, 2019, 11129 : 619 - 632
[33] Text2NeRF: Text-Driven 3D Scene Generation With Neural Radiance Fields
Zhang, Jingbo
Li, Xiaoyu
Wan, Ziyu
Wang, Can
Liao, Jing
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (12) : 7749 - 7762
[34] SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes
Delitzas, Alexandros
Takmaz, Ayca
Tombari, Federico
Sumner, Robert
Pollefeys, Marc
Engelmann, Francis
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 14531 - 14542
[35] Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion
Jakab, Tomas
Li, Ruining
Wu, Shangzhe
Rupprecht, Christian
Vedaldi, Andrea
2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 852 - 861
[36] Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models
Chung, Hyungjin
Ryu, Dohoon
Mccann, Michael T.
Klasky, Marc L.
Ye, Jong Chul
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22542 - 22551
[37] An Emotional Text-Driven 3D Visual Pronunciation System for Mandarin Chinese
Yu, Lingyun
Luo, Changwei
Yu, Jun
PATTERN RECOGNITION (CCPR 2016), PT I, 2016, 662 : 93 - 104
[38] InterFusion: Text-Driven Generation of 3D Human-Object Interaction
Dai, Sisi
Li, Wenhao
Sun, Haowen
Huang, Haibin
Ma, Chongyang
Huang, Hui
Xu, Kai
Hu, Ruizhen
COMPUTER VISION - ECCV 2024, PT XLVIII, 2025, 15106 : 18 - 35
[39] AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars
Mendiratta, Mohit
Pan, Xingang
Elgharib, Mohamed
Teotia, Kartik
Mallikarjun, B. R.
Tewari, Ayush
Golyanik, Vladislav
Kortylewski, Adam
Theobalt, Christian
ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (06):
[40] Diffusion models for 3D generation: A survey
Wang, Chen
Peng, Hao-Yang
Liu, Ying-Tian
Gu, Jiatao
Hu, Shi-Min
COMPUTATIONAL VISUAL MEDIA, 2025, 11 (01): : 1 - 28

← 1 2 3 4 5 →