Text to Image Generation with Conformer-GAN

被引：0

作者：

Deng, Zhiyu ^{[1
]}

Yu, Wenxin ^{[1
]}

Che, Lu ^{[1
]}

Chen, Shiyu ^{[1
]}

Zhang, Zhiqiang ^{[1
]}

Shang, Jun ^{[1
]}

Chen, Peng ^{[2
]}

Gong, Jun ^{[3
]}

机构：

[1] Southwest Univ Sci & Technol, Mianyang, Sichuan, Peoples R China

[2] Chengdu Hongchengyun Technol Co Ltd, Chengdu, Peoples R China

[3] Southwest Automat Res Inst, Mianyang, Sichuan, Peoples R China

来源：

NEURAL INFORMATION PROCESSING, ICONIP 2023, PT V | 2024年 / 14451卷

关键词：

Text-to-Image Synthesis; Computer Vision; Deep Learning; Generative Adversarial Networks;

D O I：

10.1007/978-981-99-8073-4_1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text-to-image generation (T2I) has been a popular research field in recent years, and its goal is to generate corresponding photorealistic images through natural language text descriptions. Existing T2I models are mostly based on generative adversarial networks, but it is still very challenging to guarantee the semantic consistency between a given textual description and generated natural images. To address this problem, we propose a concise and practical novel framework, Conformer-GAN. Specifically, we propose the Conformer block, consisting of the Convolutional Neural Network (CNN) and Transformer branches. The CNN branch is used to generate images conditionally from noise. The Transformer branch continuously focuses on the relevant words in natural language descriptions and fuses the sentence and word information to guide the CNN branch for image generation. Our approach can better merge global and local representations to improve the semantic consistency between textual information and synthetic images. Importantly, our Conformer-GAN can generate natural and realistic 512 x 512 images. Extensive experiments on the challenging public benchmark datasets CUB bird and COCO demonstrate that our method outperforms recent state-of-the-art methods both in terms of generated image quality and text-image semantic consistency.

引用

页码：3 / 14

页数：12

共 50 条

[1] Text to Image Generation Using Gan
Jindal, Rajni
Sriram, V.
Aggarwal, Vishesh
Jain, Vishesh
PERVASIVE COMPUTING AND SOCIAL NETWORKING, ICPCSN 2022, 2023, 475 : 673 - 684
[2] A Survey on Text Description to Image Generation Using GAN
Yeshasvi, Mogula
Kayal, P.
Subetha, T.
SOFT COMPUTING FOR SECURITY APPLICATIONS, ICSCS 2022, 2023, 1428 : 665 - 675
[3] Text to Image Generation with Semantic-Spatial Aware GAN
Liao, Wentong
Hu, Kai
Yang, Michael Ying
Rosenhahn, Bodo
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18166 - 18175
[4] DR-GAN: Distribution Regularization for Text-to-Image Generation
Tan, Hongchen
Liu, Xiuping
Yin, Baocai
Li, Xin
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10309 - 10323
[5] Stacking VAE and GAN for Context-aware Text-to-Image Generation
Zhang, Chenrui
Peng, Yuxin
2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
[6] AraBERT and DF-GAN fusion for Arabic text-to-image generation
Bahani, Mourad
El Ouaazizi, Aziza
Maalmi, Khalil
ARRAY, 2022, 16
[7] AraBERT and DF-GAN fusion for Arabic text-to-image generation
Bahani, Mourad
El Ouaazizi, Aziza
Maalmi, Khalil
Array, 2022, 16
[8] aRTIC GAN: A Recursive Text-Image-Conditioned GAN
Alati, Edoardo
Caracciolo, Carlo Alberto
Costa, Marco
Sanzari, Marta
Russo, Paolo
Amerini, Irene
ELECTRONICS, 2022, 11 (11)
[9] An Enhanced GAN for Image Generation
Tian, Chunwei
Gao, Haoyang
Wang, Pengwei
Zhang, Bob
CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (01): : 105 - 118
[10] SAW-GAN: Multi-granularity Text Fusion Generative Adversarial Networks for text-to-image generation
Jin, Dehu
Yu, Qi
Yu, Lan
Qi, Meng
KNOWLEDGE-BASED SYSTEMS, 2024, 294

← 1 2 3 4 5 →