Text to Image Generation with Conformer-GAN

被引:0
|
作者
Deng, Zhiyu [1 ]
Yu, Wenxin [1 ]
Che, Lu [1 ]
Chen, Shiyu [1 ]
Zhang, Zhiqiang [1 ]
Shang, Jun [1 ]
Chen, Peng [2 ]
Gong, Jun [3 ]
机构
[1] Southwest Univ Sci & Technol, Mianyang, Sichuan, Peoples R China
[2] Chengdu Hongchengyun Technol Co Ltd, Chengdu, Peoples R China
[3] Southwest Automat Res Inst, Mianyang, Sichuan, Peoples R China
关键词
Text-to-Image Synthesis; Computer Vision; Deep Learning; Generative Adversarial Networks;
D O I
10.1007/978-981-99-8073-4_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-image generation (T2I) has been a popular research field in recent years, and its goal is to generate corresponding photorealistic images through natural language text descriptions. Existing T2I models are mostly based on generative adversarial networks, but it is still very challenging to guarantee the semantic consistency between a given textual description and generated natural images. To address this problem, we propose a concise and practical novel framework, Conformer-GAN. Specifically, we propose the Conformer block, consisting of the Convolutional Neural Network (CNN) and Transformer branches. The CNN branch is used to generate images conditionally from noise. The Transformer branch continuously focuses on the relevant words in natural language descriptions and fuses the sentence and word information to guide the CNN branch for image generation. Our approach can better merge global and local representations to improve the semantic consistency between textual information and synthetic images. Importantly, our Conformer-GAN can generate natural and realistic 512 x 512 images. Extensive experiments on the challenging public benchmark datasets CUB bird and COCO demonstrate that our method outperforms recent state-of-the-art methods both in terms of generated image quality and text-image semantic consistency.
引用
收藏
页码:3 / 14
页数:12
相关论文
共 50 条
  • [1] Text to Image Generation Using Gan
    Jindal, Rajni
    Sriram, V.
    Aggarwal, Vishesh
    Jain, Vishesh
    PERVASIVE COMPUTING AND SOCIAL NETWORKING, ICPCSN 2022, 2023, 475 : 673 - 684
  • [2] A Survey on Text Description to Image Generation Using GAN
    Yeshasvi, Mogula
    Kayal, P.
    Subetha, T.
    SOFT COMPUTING FOR SECURITY APPLICATIONS, ICSCS 2022, 2023, 1428 : 665 - 675
  • [3] Text to Image Generation with Semantic-Spatial Aware GAN
    Liao, Wentong
    Hu, Kai
    Yang, Michael Ying
    Rosenhahn, Bodo
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18166 - 18175
  • [4] DR-GAN: Distribution Regularization for Text-to-Image Generation
    Tan, Hongchen
    Liu, Xiuping
    Yin, Baocai
    Li, Xin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10309 - 10323
  • [5] Stacking VAE and GAN for Context-aware Text-to-Image Generation
    Zhang, Chenrui
    Peng, Yuxin
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [6] AraBERT and DF-GAN fusion for Arabic text-to-image generation
    Bahani, Mourad
    El Ouaazizi, Aziza
    Maalmi, Khalil
    ARRAY, 2022, 16
  • [7] AraBERT and DF-GAN fusion for Arabic text-to-image generation
    Bahani, Mourad
    El Ouaazizi, Aziza
    Maalmi, Khalil
    Array, 2022, 16
  • [8] aRTIC GAN: A Recursive Text-Image-Conditioned GAN
    Alati, Edoardo
    Caracciolo, Carlo Alberto
    Costa, Marco
    Sanzari, Marta
    Russo, Paolo
    Amerini, Irene
    ELECTRONICS, 2022, 11 (11)
  • [9] An Enhanced GAN for Image Generation
    Tian, Chunwei
    Gao, Haoyang
    Wang, Pengwei
    Zhang, Bob
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (01): : 105 - 118
  • [10] SAW-GAN: Multi-granularity Text Fusion Generative Adversarial Networks for text-to-image generation
    Jin, Dehu
    Yu, Qi
    Yu, Lan
    Qi, Meng
    KNOWLEDGE-BASED SYSTEMS, 2024, 294