DiffMat: Latent diffusion models for image-guided material generation

被引:5
|
作者
Yuan, Liang [1 ]
Yan, Dingkun [2 ]
Saito, Suguru [2 ]
Fujishiro, Issei [3 ]
机构
[1] Keio Univ, Grad Sch Sci & Technol, Yokohama, Kanagawa, Japan
[2] Tokyo Inst Technol, Sch Comp, Tokyo, Japan
[3] Keio Univ, Dept Informat & Comp Sci, Yokohama, Kanagawa, Japan
来源
VISUAL INFORMATICS | 2024年 / 8卷 / 01期
关键词
SVBRDF; Diffusion model; Generative model; Appearance modeling;
D O I
10.1016/j.visinf.2023.12.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Creating realistic materials is essential in the construction of immersive virtual environments. While existing techniques for material capture and conditional generation rely on flash-lit photos, they often produce artifacts when the illumination mismatches the training data. In this study, we introduce DiffMat, a novel diffusion model that integrates the CLIP image encoder and a multi-layer, crossattention denoising backbone to generate latent materials from images under various illuminations. Using a pre-trained StyleGAN-based material generator, our method converts these latent materials into high-resolution SVBRDF textures, a process that enables a seamless fit into the standard physically based rendering pipeline, reducing the requirements for vast computational resources and expansive datasets. DiffMat surpasses existing generative methods in terms of material quality and variety, and shows adaptability to a broader spectrum of lighting conditions in reference images. (c) 2024 The Authors. Published by Elsevier B.V. on behalf of Zhejiang University and Zhejiang University Press Co. Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:6 / 14
页数:9
相关论文
共 50 条
  • [1] Medical Image Generation based on Latent Diffusion Models
    Song, Wenbo
    Jiang, Yan
    Fang, Yin
    Cao, Xinyu
    Wu, Peiyan
    Xing, Hanshuo
    Wu, Xinglong
    2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE INNOVATION, ICAII 2023, 2023, : 89 - 93
  • [2] High-Fidelity Guided Image Synthesis with Latent Diffusion Models
    Singh, Jaskirat
    Gould, Stephen
    Zheng, Liang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5997 - 6006
  • [3] Conditional Image-to-Video Generation with Latent Flow Diffusion Models
    Ni, Haomiao
    Shi, Changhao
    Li, Kai
    Huang, Sharon X.
    Min, Martin Renqiang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18444 - 18455
  • [4] StyleTex: Style Image-Guided Texture Generation for 3D Models
    Xie, Zhiyu
    Zhang, Yuqing
    Tang, Xiangjun
    Wu, Yiqian
    Chen, Dehan
    Li, Gongsheng
    Jin, Xiaogang
    ACM TRANSACTIONS ON GRAPHICS, 2024, 43 (06):
  • [5] IgSEG: Image-guided Story Ending Generation
    Huang, Qingbao
    Huang, Chuan
    Mo, Linzhang
    Wei, Jielong
    Cai, Yi
    Leung, Ho-fung
    Li, Qing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3114 - 3123
  • [6] Latent Diffusion for Guided Document Table Generation
    Hamdani, Syed Jawwad Haider
    Saifullah, Saifullah
    Agne, Stefan
    Dengel, Andreas
    Ahmed, Sheraz
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT V, 2024, 14808 : 368 - 383
  • [7] Motion and biomechanical models for image-guided interventions
    Hawkes, D. J.
    Penney, G.
    Atkinson, D.
    Barratt, D.
    Blackall, J.
    Carter, T.
    Crum, W. R.
    McClelland, J.
    Tanner, C.
    Tarte, S.
    White, M.
    2007 4TH IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING : MACRO TO NANO, VOLS 1-3, 2007, : 992 - 995
  • [8] Brain Imaging Generation with Latent Diffusion Models
    Pinaya, Walter H. L.
    Tudosiu, Petru-Daniel
    Dafflon, Jessica
    Da Costa, Pedro F.
    Fernandez, Virginia
    Nachev, Parashkev
    Ourselin, Sebastien
    Cardoso, M. Jorge
    DEEP GENERATIVE MODELS, DGM4MICCAI 2022, 2022, 13609 : 117 - 126
  • [9] Iterative Adversarial Attack on Image-Guided Story Ending Generation
    Wang, Youze
    Hu, Wenbo
    Hong, Richang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6117 - 6130
  • [10] Multimodal Event Transformer for Image-guided Story Ending Generation
    Zhou, Yucheng
    Long, Guodong
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3434 - 3444