Shifted Diffusion for Text-to-image Generation

被引:15
|
作者
Zhou, Yufan [1 ]
Liu, Bingchen [2 ]
Zhu, Yizhe [2 ]
Yang, Xiao [2 ]
Chen, Changyou [1 ]
Xu, Jinhui [1 ]
机构
[1] SUNY Buffalo, Buffalo, NY 14260 USA
[2] ByteDance, Beijing, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
关键词
D O I
10.1109/CVPR52729.2023.00979
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present Corgi, a novel method for text-to-image generation. Corgi is based on our proposed shifted diffusion model, which achieves better image embedding generation from input text. Unlike the baseline diffusion model used in DALL-E 2, our method seamlessly encodes prior knowledge of the pre-trained CLIP model in its diffusion process by designing a new initialization distribution and a new transition step of the diffusion. Compared to the strong DALL-E 2 baseline, our method performs better in generating image embedding from the text in terms of both efficiency and effectiveness, resulting in better text-to-image generation. Extensive large-scale experiments are conducted and evaluated in terms of both quantitative measures and human evaluation, indicating a stronger generation ability of our method compared to existing ones. Furthermore, our model enables semi-supervised and language-free training for text-to-image generation, where only part or none of the images in the training dataset have an associated caption. Trained with only 1.7% of the images being captioned, our semi-supervised model obtains FID results comparable to DALL-E 2 on zero-shot text-to-image generation evaluated on MS-COCO. Corgi also achieves new state-of-the-art results across different datasets on downstream language-free text-to-image generation tasks, outperforming the previous method, Lafite, by a large margin.
引用
收藏
页码:10157 / 10166
页数:10
相关论文
共 50 条
  • [41] InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
    Hoe, Jiun Tian
    Jiang, Xudong
    Chan, Chee Seng
    Tan, Yap-Peng
    Hu, Weipeng
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6180 - 6189
  • [42] Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models
    Xu, Xingqian
    Guo, Jiayi
    Wang, Zhangyang
    Huang, Gao
    Essa, Irfan
    Shi, Humphrey
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8682 - 8692
  • [43] Discriminative Class Tokens for Text-to-Image Diffusion Models
    Schwartz, Idan
    Snaebjarnarson, Vesteinn
    Chefer, Hila
    Belongie, Serge
    Wolf, Lior
    Benaim, Sagie
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22668 - 22678
  • [44] Adding Conditional Control to Text-to-Image Diffusion Models
    Zhang, Lvmin
    Rao, Anyi
    Agrawala, Maneesh
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3813 - 3824
  • [45] Out-of-Distribution with Text-to-Image Diffusion Models
    Tong, Jinglin
    Dai, Longquan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XI, 2024, 14435 : 276 - 288
  • [46] Multi-Concept Customization of Text-to-Image Diffusion
    Kumari, Nupur
    Zhang, Bingliang
    Zhang, Richard
    Shechtman, Eli
    Zhu, Jun-Yan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1931 - 1941
  • [47] Editing Implicit Assumptions in Text-to-Image Diffusion Models
    Orgad, Hadas
    Kawar, Bahjat
    Belinkov, Yonatan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7030 - 7038
  • [48] Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
    Wu, Qiucheng
    Liu, Yujian
    Zhao, Handong
    Kale, Ajinkya
    Bui, Trung
    Yu, Tong
    Lin, Zhe
    Zhang, Yang
    Chang, Shiyu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1900 - 1910
  • [49] Adversarial Robustification via Text-to-Image Diffusion Models
    Choi, Daewon
    Jeong, Jongheon
    Jang, Huiwon
    Shin, Jinwoo
    COMPUTER VISION - ECCV 2024, PT LXXXI, 2025, 15139 : 158 - 177
  • [50] Progressive Text-to-Image Diffusion with Soft Latent Direction
    Ye, Yuteng
    Cai, Jiale
    Zhou, Hang
    Li, Guanwen
    Zhang, Youjia
    Song, Zikai
    Gao, Chenxing
    Yu, Junqing
    Yang, Wei
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6693 - 6701