SINE: SINgle Image Editing with Text-to-Image Diffusion Models

被引:12
|
作者
Zhang, Zhixing [1 ]
Han, Ligong [1 ]
Ghosh, Arnab [2 ]
Metaxas, Dimitris [1 ]
Ren, Jian [2 ]
机构
[1] Rutgers State Univ, New Brunswick, NJ 08901 USA
[2] Snap Inc, Santa Monica, CA USA
关键词
D O I
10.1109/CVPR52729.2023.00584
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent works on diffusion models have demonstrated a strong capability for conditioning image generation, e.g., text-guided image synthesis. Such success inspires many efforts trying to use large-scale pre-trained diffusion models for tackling a challenging problem-real image editing. Works conducted in this area learn a unique textual token corresponding to several images containing the same object. However, under many circumstances, only one image is available, such as the painting of the Girl with a Pearl Earring. Using existing works on fine-tuning the pre-trained diffusion models with a single image causes severe overfitting issues. The information leakage from the pre-trained diffusion models makes editing can not keep the same content as the given image while creating new features depicted by the language guidance. This work aims to address the problem of single-image editing. We propose a novel model-based guidance built upon the classifier-free guidance so that the knowledge from the model trained on a single image can be distilled into the pre-trained diffusion model, enabling content creation even with one given image. Additionally, we propose a patch-based fine-tuning that can effectively help the model generate images of arbitrary resolution. We provide extensive experiments to validate the design choices of our approach and show promising editing capabilities, including changing style, content addition, and object manipulation. Our code is made publicly available here.
引用
收藏
页码:6027 / 6037
页数:11
相关论文
共 50 条
  • [1] Editing Implicit Assumptions in Text-to-Image Diffusion Models
    Orgad, Hadas
    Kawar, Bahjat
    Belinkov, Yonatan
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7030 - 7038
  • [2] Towards Consistent Video Editing with Text-to-Image Diffusion Models
    Zhang, Zicheng
    Li, Bonan
    Nie, Xuecheng
    Han, Congying
    Guo, Tiande
    Liu, Luoqi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Ablating Concepts in Text-to-Image Diffusion Models
    Kumari, Nupur
    Zhang, Bingliang
    Wang, Sheng-Yu
    Shechtman, Eli
    Zhang, Richard
    Zhu, Jun-Yan
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22634 - 22645
  • [4] Unleashing Text-to-Image Diffusion Models for Visual Perception
    Zhao, Wenliang
    Rao, Yongming
    Liu, Zuyan
    Liu, Benlin
    Zhou, Jie
    Lu, Jiwen
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5706 - 5716
  • [5] Adding Conditional Control to Text-to-Image Diffusion Models
    Zhang, Lvmin
    Rao, Anyi
    Agrawala, Maneesh
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3813 - 3824
  • [6] Discriminative Class Tokens for Text-to-Image Diffusion Models
    Schwartz, Idan
    Snaebjarnarson, Vesteinn
    Chefer, Hila
    Belongie, Serge
    Wolf, Lior
    Benaim, Sagie
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22668 - 22678
  • [7] Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
    Wu, Qiucheng
    Liu, Yujian
    Zhao, Handong
    Kale, Ajinkya
    Bui, Trung
    Yu, Tong
    Lin, Zhe
    Zhang, Yang
    Chang, Shiyu
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1900 - 1910
  • [8] Out-of-Distribution with Text-to-Image Diffusion Models
    Tong, Jinglin
    Dai, Longquan
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XI, 2024, 14435 : 276 - 288
  • [9] Sketch-Guided Text-to-Image Diffusion Models
    Voynov, Andrey
    Aberman, Kfir
    Cohen-Or, Daniel
    [J]. PROCEEDINGS OF SIGGRAPH 2023 CONFERENCE PAPERS, SIGGRAPH 2023, 2023,
  • [10] Text-to-Image Diffusion Models are Zero-Shot Classifiers
    Clark, Kevin
    Jaini, Priyank
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,