Sketch-Guided Text-to-Image Diffusion Models

被引:25
|
作者
Voynov, Andrey [1 ]
Aberman, Kfir [2 ]
Cohen-Or, Daniel [1 ,3 ]
机构
[1] Google Res, Tel Aviv, Israel
[2] Google Res, San Francisco, CA USA
[3] Tel Aviv Univ, Blavatnik Sch Comp Sci, Tel Aviv, Israel
关键词
diffusion models; image translation;
D O I
10.1145/3588432.3591560
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-Image models have introduced a remarkable leap in the evolution of machine learning, demonstrating high-quality synthesis of images from a given text-prompt. However, these powerful pretrained models still lack control handles that can guide spatial properties of the synthesized images. In this work, we introduce a universal approach to guide a pretrained text-to-image diffusion model, with a spatial map from another domain (e.g., sketch) during inference time. Unlike previous works, our method does not require to train a dedicated model or a specialized encoder for the task. Our key idea is to train a Latent Guidance Predictor (LGP) - a small, perpixel, Multi-Layer Perceptron (MLP) that maps latent features of noisy images to spatial maps, where the deep features are extracted from the core Denoising Diffusion Probabilistic Model (DDPM) network. The LGP is trained only on a few thousand images and constitutes a differential guiding map predictor, over which the loss is computed and propagated back to push the intermediate images to agree with the spatial map. The per-pixel training offers flexibility and locality which allows the technique to perform well on out-of-domain sketches, including free-hand style drawings. We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images that follow the guidance of a sketch of arbitrary style or domain.
引用
下载
收藏
页数:11
相关论文
共 50 条
  • [11] Editing Implicit Assumptions in Text-to-Image Diffusion Models
    Orgad, Hadas
    Kawar, Bahjat
    Belinkov, Yonatan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7030 - 7038
  • [12] Unleashing Text-to-Image Diffusion Models for Visual Perception
    Zhao, Wenliang
    Rao, Yongming
    Liu, Zuyan
    Liu, Benlin
    Zhou, Jie
    Lu, Jiwen
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5706 - 5716
  • [13] Sketch-Guided Latent Diffusion Model for High-Fidelity Face Image Synthesis
    Peng, Yichen
    Zhao, Chunqi
    Xie, Haoran
    Fukusato, Tsukasa
    Miyata, Kazunori
    IEEE ACCESS, 2024, 12 : 5770 - 5780
  • [14] Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
    Wu, Qiucheng
    Liu, Yujian
    Zhao, Handong
    Kale, Ajinkya
    Bui, Trung
    Yu, Tong
    Lin, Zhe
    Zhang, Yang
    Chang, Shiyu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1900 - 1910
  • [15] Out-of-Distribution with Text-to-Image Diffusion Models
    Tong, Jinglin
    Dai, Longquan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XI, 2024, 14435 : 276 - 288
  • [16] Text-to-Image Diffusion Models are Zero-Shot Classifiers
    Clark, Kevin
    Jaini, Priyank
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [17] The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
    Avrahami, Omri
    Hertz, Amir
    Vinker, Yael
    Arar, Moab
    Fruchter, Shlomi
    Fried, Ohad
    Cohen-Or, Daniel
    Lischinski, Dani
    PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
  • [18] Exposing fake images generated by text-to-image diffusion models
    Xu, Qiang
    Wang, Hao
    Meng, Laijin
    Mi, Zhongjie
    Yuan, Jianye
    Yan, Hong
    PATTERN RECOGNITION LETTERS, 2023, 176 : 76 - 82
  • [19] Adversarial attacks and defenses on text-to-image diffusion models: A survey
    Zhang, Chenyu
    Hu, Mingwang
    Li, Wenhui
    Wang, Lanjun
    Information Fusion, 2025, 114
  • [20] Towards Consistent Video Editing with Text-to-Image Diffusion Models
    Zhang, Zicheng
    Li, Bonan
    Nie, Xuecheng
    Han, Congying
    Guo, Tiande
    Liu, Luoqi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,