Shifted Diffusion for Text-to-image Generation

被引：15

作者：

Zhou, Yufan ^{[1
]}

Liu, Bingchen ^{[2
]}

Zhu, Yizhe ^{[2
]}

Yang, Xiao ^{[2
]}

Chen, Changyou ^{[1
]}

Xu, Jinhui ^{[1
]}

机构：

[1] SUNY Buffalo, Buffalo, NY 14260 USA

[2] ByteDance, Beijing, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.00979

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present Corgi, a novel method for text-to-image generation. Corgi is based on our proposed shifted diffusion model, which achieves better image embedding generation from input text. Unlike the baseline diffusion model used in DALL-E 2, our method seamlessly encodes prior knowledge of the pre-trained CLIP model in its diffusion process by designing a new initialization distribution and a new transition step of the diffusion. Compared to the strong DALL-E 2 baseline, our method performs better in generating image embedding from the text in terms of both efficiency and effectiveness, resulting in better text-to-image generation. Extensive large-scale experiments are conducted and evaluated in terms of both quantitative measures and human evaluation, indicating a stronger generation ability of our method compared to existing ones. Furthermore, our model enables semi-supervised and language-free training for text-to-image generation, where only part or none of the images in the training dataset have an associated caption. Trained with only 1.7% of the images being captioned, our semi-supervised model obtains FID results comparable to DALL-E 2 on zero-shot text-to-image generation evaluated on MS-COCO. Corgi also achieves new state-of-the-art results across different datasets on downstream language-free text-to-image generation tasks, outperforming the previous method, Lafite, by a large margin.

引用

页码：10157 / 10166

页数：10

共 50 条

[1] EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
Yang, Jingyuan
Feng, Jiawei
Huang, Hui
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6358 - 6368
[2] Controllable Text-to-Image Generation
Li, Bowen
Qi, Xiaojuan
Lukasiewicz, Thomas
Torr, Philip H. S.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] Surgical text-to-image generation
Nwoye, Chinedu Innocent
Bose, Rupak
Elgohary, Kareem
Arboit, Lorenzo
Carlino, Giorgio
Lavanchy, Joel L.
Mascagni, Pietro
Padoy, Nicolas
PATTERN RECOGNITION LETTERS, 2025, 190 : 73 - 80
[4] Expressive Text-to-Image Generation with Rich Text
Ge, Songwei
Park, Taesung
Zhu, Jun-Yan
Huang, Jia-Bin
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7511 - 7522
[5] MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models
Zhao, Jing
Zheng, Heliang
Wang, Chaoyue
Lan, Long
Yang, Wenjing
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22535 - 22545
[6] Comparative Review of Text-to-Image Generation Techniques Based on Diffusion Models
Gao, Xinyu
Du, Fang
Song, Lijuan
Computer Engineering and Applications, 2024, 60 (24) : 44 - 64
[7] Debiasing Text-to-Image Diffusion Models
He, Ruifei
Xue, Chuhui
Tan, Haoru
Zhang, Wenqing
Yu, Yingchen
Bai, Song
Qi, Xiaojuan
PROCEEDINGS OF THE 1ST ACM MULTIMEDIA WORKSHOP ON MULTI-MODAL MISINFORMATION GOVERNANCE IN THE ERA OF FOUNDATION MODELS, MIS 2024, 2024, : 29 - 36
[8] RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Xue, Zeyue
Song, Guanglu
Guo, Qiushan
Liu, Boxiao
Zong, Zhuofan
Liu, Yu
Luo, Ping
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[9] SEMANTICALLY INVARIANT TEXT-TO-IMAGE GENERATION
Sah, Shagan
Peri, Dheeraj
Shringi, Ameya
Zhang, Chi
Dominguez, Miguel
Savakis, Andreas
Ptucha, Ray
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 3783 - 3787
[10] Text-to-Image Generation for Abstract Concepts
Liao, Jiayi
Chen, Xu
Fu, Qiang
Du, Lun
He, Xiangnan
Wang, Xiang
Han, Shi
Zhang, Dongmei
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3360 - 3368

← 1 2 3 4 5 →