Sketch-Guided Text-to-Image Diffusion Models

被引:25
|
作者
Voynov, Andrey [1 ]
Aberman, Kfir [2 ]
Cohen-Or, Daniel [1 ,3 ]
机构
[1] Google Res, Tel Aviv, Israel
[2] Google Res, San Francisco, CA USA
[3] Tel Aviv Univ, Blavatnik Sch Comp Sci, Tel Aviv, Israel
关键词
diffusion models; image translation;
D O I
10.1145/3588432.3591560
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-Image models have introduced a remarkable leap in the evolution of machine learning, demonstrating high-quality synthesis of images from a given text-prompt. However, these powerful pretrained models still lack control handles that can guide spatial properties of the synthesized images. In this work, we introduce a universal approach to guide a pretrained text-to-image diffusion model, with a spatial map from another domain (e.g., sketch) during inference time. Unlike previous works, our method does not require to train a dedicated model or a specialized encoder for the task. Our key idea is to train a Latent Guidance Predictor (LGP) - a small, perpixel, Multi-Layer Perceptron (MLP) that maps latent features of noisy images to spatial maps, where the deep features are extracted from the core Denoising Diffusion Probabilistic Model (DDPM) network. The LGP is trained only on a few thousand images and constitutes a differential guiding map predictor, over which the loss is computed and propagated back to push the intermediate images to agree with the spatial map. The per-pixel training offers flexibility and locality which allows the technique to perform well on out-of-domain sketches, including free-hand style drawings. We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images that follow the guidance of a sketch of arbitrary style or domain.
引用
下载
收藏
页数:11
相关论文
共 50 条
  • [1] SKETCHFFUSION: SKETCH-GUIDED IMAGE EDITING WITH DIFFUSION MODEL
    Mao, Weihang
    Han, Bo
    Wang, Zihao
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 790 - 794
  • [2] Sketch-Guided Scenery Image Outpainting
    Wang, Yaxiong
    Wei, Yunchao
    Qian, Xueming
    Zhu, Li
    Yang, Yi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2643 - 2655
  • [3] Text-guided image-to-sketch diffusion models☆
    Ke, Aihua
    Huang, Yujie
    Cai, Bo
    Yang, Jie
    KNOWLEDGE-BASED SYSTEMS, 2024, 304
  • [4] Crayon Lighting: Sketch-guided Illumination of Models
    Shesh, Arnit
    Chen, Baoquan
    GRAPHITE 2007: 5TH INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS AND INTERACTIVE TECHNIQUES IN AUSTRALASIA AND SOUTHERN ASIA, PROCEEDINGS, 2007, : 95 - +
  • [5] Ablating Concepts in Text-to-Image Diffusion Models
    Kumari, Nupur
    Zhang, Bingliang
    Wang, Sheng-Yu
    Shechtman, Eli
    Zhang, Richard
    Zhu, Jun-Yan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22634 - 22645
  • [6] Sketch-guided flow field generation with diffusion model
    Chang, Hengyuan
    Peng, Yichen
    Sato, Syuhei
    Xie, Haoran
    INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY, IWAIT 2024, 2024, 13164
  • [7] Sketch-guided texture-based image inpainting
    Chen, Yan
    Luan, Qing
    Li, Houqiang
    Au, Oscar
    2006 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP 2006, PROCEEDINGS, 2006, : 1997 - +
  • [8] SINE: SINgle Image Editing with Text-to-Image Diffusion Models
    Zhang, Zhixing
    Han, Ligong
    Ghosh, Arnab
    Metaxas, Dimitris
    Ren, Jian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6027 - 6037
  • [9] Adding Conditional Control to Text-to-Image Diffusion Models
    Zhang, Lvmin
    Rao, Anyi
    Agrawala, Maneesh
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3813 - 3824
  • [10] Discriminative Class Tokens for Text-to-Image Diffusion Models
    Schwartz, Idan
    Snaebjarnarson, Vesteinn
    Chefer, Hila
    Belongie, Serge
    Wolf, Lior
    Benaim, Sagie
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22668 - 22678