Shifted Diffusion for Text-to-image Generation

被引：15

作者：

Zhou, Yufan ^{[1
]}

Liu, Bingchen ^{[2
]}

Zhu, Yizhe ^{[2
]}

Yang, Xiao ^{[2
]}

Chen, Changyou ^{[1
]}

Xu, Jinhui ^{[1
]}

机构：

[1] SUNY Buffalo, Buffalo, NY 14260 USA

[2] ByteDance, Beijing, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.00979

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present Corgi, a novel method for text-to-image generation. Corgi is based on our proposed shifted diffusion model, which achieves better image embedding generation from input text. Unlike the baseline diffusion model used in DALL-E 2, our method seamlessly encodes prior knowledge of the pre-trained CLIP model in its diffusion process by designing a new initialization distribution and a new transition step of the diffusion. Compared to the strong DALL-E 2 baseline, our method performs better in generating image embedding from the text in terms of both efficiency and effectiveness, resulting in better text-to-image generation. Extensive large-scale experiments are conducted and evaluated in terms of both quantitative measures and human evaluation, indicating a stronger generation ability of our method compared to existing ones. Furthermore, our model enables semi-supervised and language-free training for text-to-image generation, where only part or none of the images in the training dataset have an associated caption. Trained with only 1.7% of the images being captioned, our semi-supervised model obtains FID results comparable to DALL-E 2 on zero-shot text-to-image generation evaluated on MS-COCO. Corgi also achieves new state-of-the-art results across different datasets on downstream language-free text-to-image generation tasks, outperforming the previous method, Lafite, by a large margin.

引用

页码：10157 / 10166

页数：10

共 50 条

[21] AltDiffusion: A Multilingual Text-to-Image Diffusion Model
Ye, Fulong
Liu, Guang
Wu, Xinya
Wu, Ledell
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6648 - 6656
[22] Controlling Text-to-Image Diffusion by Orthogonal Finetuning
Qiu, Zeju
Liu, Weiyang
Feng, Haiwen
Xue, Yuxuan
Feng, Yao
Liu, Zhen
Zhang, Dan
Weller, Adrian
Schoelkopf, Bernhard
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[23] Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
Toker, Michael
Orgad, Hadas
Ventura, Mor
Arad, Dana
Belinkov, Yonatan
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 9713 - 9728
[24] Zero-Shot Text-to-Image Generation
Ramesh, Aditya
Pavlov, Mikhail
Goh, Gabriel
Gray, Scott
Voss, Chelsea
Radford, Alec
Chen, Mark
Sutskever, Ilya
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[25] Dense Text-to-Image Generation with Attention Modulation
Kim, Yunji
Lee, Jiyoung
Kim, Jin-Hwa
Ha, Jung-Woo
Zhu, Jun-Yan
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7667 - 7677
[26] Visual Programming for Text-to-Image Generation and Evaluation
Cho, Jaemin
Zala, Abhay
Bansal, Mohit
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[27] MirrorGAN: Learning Text-to-image Generation by Redescription
Qiao, Tingting
Zhang, Jing
Xu, Duanqing
Tao, Dacheng
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1505 - 1514
[28] StyleDrop: Text-to-Image Generation in Any Style
Sohn, Kihyuk
Ruiz, Nataniel
Lee, Kimin
Chin, Daniel Castro
Blok, Irina
Chang, Huiwen
Barber, Jarred
Jiang, Lu
Entis, Glenn
Li, Yuanzhen
Hao, Yuan
Essa, Irfan
Rubinstein, Michael
Krishnan, Dilip
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[29] A taxonomy of prompt modifiers for text-to-image generation
Oppenlaender, Jonas
BEHAVIOUR & INFORMATION TECHNOLOGY, 2024, 43 (15) : 3763 - 3776
[30] SINE: SINgle Image Editing with Text-to-Image Diffusion Models
Zhang, Zhixing
Han, Ligong
Ghosh, Arnab
Metaxas, Dimitris
Ren, Jian
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6027 - 6037

← 1 2 3 4 5 →