Text to Image Generation with Conformer-GAN

被引:0
|
作者
Deng, Zhiyu [1 ]
Yu, Wenxin [1 ]
Che, Lu [1 ]
Chen, Shiyu [1 ]
Zhang, Zhiqiang [1 ]
Shang, Jun [1 ]
Chen, Peng [2 ]
Gong, Jun [3 ]
机构
[1] Southwest Univ Sci & Technol, Mianyang, Sichuan, Peoples R China
[2] Chengdu Hongchengyun Technol Co Ltd, Chengdu, Peoples R China
[3] Southwest Automat Res Inst, Mianyang, Sichuan, Peoples R China
关键词
Text-to-Image Synthesis; Computer Vision; Deep Learning; Generative Adversarial Networks;
D O I
10.1007/978-981-99-8073-4_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-image generation (T2I) has been a popular research field in recent years, and its goal is to generate corresponding photorealistic images through natural language text descriptions. Existing T2I models are mostly based on generative adversarial networks, but it is still very challenging to guarantee the semantic consistency between a given textual description and generated natural images. To address this problem, we propose a concise and practical novel framework, Conformer-GAN. Specifically, we propose the Conformer block, consisting of the Convolutional Neural Network (CNN) and Transformer branches. The CNN branch is used to generate images conditionally from noise. The Transformer branch continuously focuses on the relevant words in natural language descriptions and fuses the sentence and word information to guide the CNN branch for image generation. Our approach can better merge global and local representations to improve the semantic consistency between textual information and synthetic images. Importantly, our Conformer-GAN can generate natural and realistic 512 x 512 images. Extensive experiments on the challenging public benchmark datasets CUB bird and COCO demonstrate that our method outperforms recent state-of-the-art methods both in terms of generated image quality and text-image semantic consistency.
引用
收藏
页码:3 / 14
页数:12
相关论文
共 50 条
  • [21] Trace Controlled Text to Image Generation
    Yan, Kun
    Ji, Lei
    Wu, Chenfei
    Bao, Jianmin
    Zhou, Ming
    Duan, Nan
    Ma, Shuai
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 59 - 75
  • [22] Text to Image Generation of Fashion Clothing
    Jain, Anish
    Modi, Diti
    Jikadra, Rudra
    Chachra, Shweta
    PROCEEDINGS OF THE 2019 6TH INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2019, : 355 - 358
  • [23] Paired-D++ GAN for image manipulation with text
    Duc Minh Vo
    Akihiro Sugimoto
    Machine Vision and Applications, 2022, 33
  • [24] Counterfactual GAN for debiased text-to-image synthesis
    Kong, Xianghua
    Xu, Ning
    Sun, Zefang
    Shen, Zhewen
    Zheng, Bolun
    Yan, Chenggang
    Cao, Jinbo
    Kang, Rongbao
    Liu, An-An
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [25] FA-GAN: FEATURE-AWARE GAN FOR TEXT TO IMAGE SYNTHESIS
    Jeon, Eunyeong
    Kim, Kunhee
    Kim, Daijin
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2443 - 2447
  • [26] CgT-GAN: CLIP-guided Text GAN for Image Captioning
    Yu, Jiarui
    Li, Haoran
    Hao, Yanbin
    Zhu, Bin
    Xu, Tong
    He, Xiangnan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2252 - 2263
  • [27] Counterfactual GAN for debiased text-to-image synthesisCounterfactual GAN for debiased text-to-image synthesisX. Kong et al.
    Xianghua Kong
    Ning Xu
    Zefang Sun
    Zhewen Shen
    Bolun Zheng
    Chenggang Yan
    Jinbo Cao
    Rongbao Kang
    An-An Liu
    Multimedia Systems, 2025, 31 (1)
  • [28] Controllable Text Layout Generation For Synthesizing Scene Text Image
    Chen, Huen
    He, Jiangyang
    Zhu, Anna
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT V, 2024, 14808 : 147 - 161
  • [29] Text-to-Image Generation Method Based on Image-Text Semantic Consistency
    Xue Z.
    Xu Z.
    Lang C.
    Feng S.
    Wang T.
    Li Y.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2180 - 2190
  • [30] Adma-GAN: Attribute-Driven Memory Augmented GANs for Text-to-Image Generation.
    Wu, Xintian
    Zhao, Hanbin
    Zheng, Liangli
    Ding, Shouhong
    Li, Xi
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1593 - 1602