Shifted Diffusion for Text-to-image Generation

被引：15

作者：

Zhou, Yufan ^{[1
]}

Liu, Bingchen ^{[2
]}

Zhu, Yizhe ^{[2
]}

Yang, Xiao ^{[2
]}

Chen, Changyou ^{[1
]}

Xu, Jinhui ^{[1
]}

机构：

[1] SUNY Buffalo, Buffalo, NY 14260 USA

[2] ByteDance, Beijing, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.00979

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present Corgi, a novel method for text-to-image generation. Corgi is based on our proposed shifted diffusion model, which achieves better image embedding generation from input text. Unlike the baseline diffusion model used in DALL-E 2, our method seamlessly encodes prior knowledge of the pre-trained CLIP model in its diffusion process by designing a new initialization distribution and a new transition step of the diffusion. Compared to the strong DALL-E 2 baseline, our method performs better in generating image embedding from the text in terms of both efficiency and effectiveness, resulting in better text-to-image generation. Extensive large-scale experiments are conducted and evaluated in terms of both quantitative measures and human evaluation, indicating a stronger generation ability of our method compared to existing ones. Furthermore, our model enables semi-supervised and language-free training for text-to-image generation, where only part or none of the images in the training dataset have an associated caption. Trained with only 1.7% of the images being captioned, our semi-supervised model obtains FID results comparable to DALL-E 2 on zero-shot text-to-image generation evaluated on MS-COCO. Corgi also achieves new state-of-the-art results across different datasets on downstream language-free text-to-image generation tasks, outperforming the previous method, Lafite, by a large margin.

引用

页码：10157 / 10166

页数：10

共 50 条

[41] InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
Hoe, Jiun Tian
Jiang, Xudong
Chan, Chee Seng
Tan, Yap-Peng
Hu, Weipeng
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6180 - 6189
[42] Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models
Xu, Xingqian
Guo, Jiayi
Wang, Zhangyang
Huang, Gao
Essa, Irfan
Shi, Humphrey
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8682 - 8692
[43] Discriminative Class Tokens for Text-to-Image Diffusion Models
Schwartz, Idan
Snaebjarnarson, Vesteinn
Chefer, Hila
Belongie, Serge
Wolf, Lior
Benaim, Sagie
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22668 - 22678
[44] Adding Conditional Control to Text-to-Image Diffusion Models
Zhang, Lvmin
Rao, Anyi
Agrawala, Maneesh
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3813 - 3824
[45] Out-of-Distribution with Text-to-Image Diffusion Models
Tong, Jinglin
Dai, Longquan
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XI, 2024, 14435 : 276 - 288
[46] Multi-Concept Customization of Text-to-Image Diffusion
Kumari, Nupur
Zhang, Bingliang
Zhang, Richard
Shechtman, Eli
Zhu, Jun-Yan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1931 - 1941
[47] Editing Implicit Assumptions in Text-to-Image Diffusion Models
Orgad, Hadas
Kawar, Bahjat
Belinkov, Yonatan
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7030 - 7038
[48] Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
Wu, Qiucheng
Liu, Yujian
Zhao, Handong
Kale, Ajinkya
Bui, Trung
Yu, Tong
Lin, Zhe
Zhang, Yang
Chang, Shiyu
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1900 - 1910
[49] Adversarial Robustification via Text-to-Image Diffusion Models
Choi, Daewon
Jeong, Jongheon
Jang, Huiwon
Shin, Jinwoo
COMPUTER VISION - ECCV 2024, PT LXXXI, 2025, 15139 : 158 - 177
[50] Progressive Text-to-Image Diffusion with Soft Latent Direction
Ye, Yuteng
Cai, Jiale
Zhou, Hang
Li, Guanwen
Zhang, Youjia
Song, Zikai
Gao, Chenxing
Yu, Junqing
Yang, Wei
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6693 - 6701

← 1 2 3 4 5 →