Text to Image Generation with Conformer-GAN

被引：0

作者：

Deng, Zhiyu ^{[1
]}

Yu, Wenxin ^{[1
]}

Che, Lu ^{[1
]}

Chen, Shiyu ^{[1
]}

Zhang, Zhiqiang ^{[1
]}

Shang, Jun ^{[1
]}

Chen, Peng ^{[2
]}

Gong, Jun ^{[3
]}

机构：

[1] Southwest Univ Sci & Technol, Mianyang, Sichuan, Peoples R China

[2] Chengdu Hongchengyun Technol Co Ltd, Chengdu, Peoples R China

[3] Southwest Automat Res Inst, Mianyang, Sichuan, Peoples R China

来源：

NEURAL INFORMATION PROCESSING, ICONIP 2023, PT V | 2024年 / 14451卷

关键词：

Text-to-Image Synthesis; Computer Vision; Deep Learning; Generative Adversarial Networks;

D O I：

10.1007/978-981-99-8073-4_1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text-to-image generation (T2I) has been a popular research field in recent years, and its goal is to generate corresponding photorealistic images through natural language text descriptions. Existing T2I models are mostly based on generative adversarial networks, but it is still very challenging to guarantee the semantic consistency between a given textual description and generated natural images. To address this problem, we propose a concise and practical novel framework, Conformer-GAN. Specifically, we propose the Conformer block, consisting of the Convolutional Neural Network (CNN) and Transformer branches. The CNN branch is used to generate images conditionally from noise. The Transformer branch continuously focuses on the relevant words in natural language descriptions and fuses the sentence and word information to guide the CNN branch for image generation. Our approach can better merge global and local representations to improve the semantic consistency between textual information and synthetic images. Importantly, our Conformer-GAN can generate natural and realistic 512 x 512 images. Extensive experiments on the challenging public benchmark datasets CUB bird and COCO demonstrate that our method outperforms recent state-of-the-art methods both in terms of generated image quality and text-image semantic consistency.

引用

页码：3 / 14

页数：12

共 50 条

[21] Trace Controlled Text to Image Generation
Yan, Kun
Ji, Lei
Wu, Chenfei
Bao, Jianmin
Zhou, Ming
Duan, Nan
Ma, Shuai
COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 59 - 75
[22] Text to Image Generation of Fashion Clothing
Jain, Anish
Modi, Diti
Jikadra, Rudra
Chachra, Shweta
PROCEEDINGS OF THE 2019 6TH INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2019, : 355 - 358
[23] Paired-D++ GAN for image manipulation with text
Duc Minh Vo
Akihiro Sugimoto
Machine Vision and Applications, 2022, 33
[24] Counterfactual GAN for debiased text-to-image synthesis
Kong, Xianghua
Xu, Ning
Sun, Zefang
Shen, Zhewen
Zheng, Bolun
Yan, Chenggang
Cao, Jinbo
Kang, Rongbao
Liu, An-An
MULTIMEDIA SYSTEMS, 2025, 31 (01)
[25] FA-GAN: FEATURE-AWARE GAN FOR TEXT TO IMAGE SYNTHESIS
Jeon, Eunyeong
Kim, Kunhee
Kim, Daijin
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2443 - 2447
[26] CgT-GAN: CLIP-guided Text GAN for Image Captioning
Yu, Jiarui
Li, Haoran
Hao, Yanbin
Zhu, Bin
Xu, Tong
He, Xiangnan
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2252 - 2263
[27] Counterfactual GAN for debiased text-to-image synthesisCounterfactual GAN for debiased text-to-image synthesisX. Kong et al.
Xianghua Kong
Ning Xu
Zefang Sun
Zhewen Shen
Bolun Zheng
Chenggang Yan
Jinbo Cao
Rongbao Kang
An-An Liu
Multimedia Systems, 2025, 31 (1)
[28] Controllable Text Layout Generation For Synthesizing Scene Text Image
Chen, Huen
He, Jiangyang
Zhu, Anna
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT V, 2024, 14808 : 147 - 161
[29] Text-to-Image Generation Method Based on Image-Text Semantic Consistency
Xue Z.
Xu Z.
Lang C.
Feng S.
Wang T.
Li Y.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2180 - 2190
[30] Adma-GAN: Attribute-Driven Memory Augmented GANs for Text-to-Image Generation.
Wu, Xintian
Zhao, Hanbin
Zheng, Liangli
Ding, Shouhong
Li, Xi
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1593 - 1602

← 1 2 3 4 5 →