Text to Image Generation with Conformer-GAN

被引:0
|
作者
Deng, Zhiyu [1 ]
Yu, Wenxin [1 ]
Che, Lu [1 ]
Chen, Shiyu [1 ]
Zhang, Zhiqiang [1 ]
Shang, Jun [1 ]
Chen, Peng [2 ]
Gong, Jun [3 ]
机构
[1] Southwest Univ Sci & Technol, Mianyang, Sichuan, Peoples R China
[2] Chengdu Hongchengyun Technol Co Ltd, Chengdu, Peoples R China
[3] Southwest Automat Res Inst, Mianyang, Sichuan, Peoples R China
关键词
Text-to-Image Synthesis; Computer Vision; Deep Learning; Generative Adversarial Networks;
D O I
10.1007/978-981-99-8073-4_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-image generation (T2I) has been a popular research field in recent years, and its goal is to generate corresponding photorealistic images through natural language text descriptions. Existing T2I models are mostly based on generative adversarial networks, but it is still very challenging to guarantee the semantic consistency between a given textual description and generated natural images. To address this problem, we propose a concise and practical novel framework, Conformer-GAN. Specifically, we propose the Conformer block, consisting of the Convolutional Neural Network (CNN) and Transformer branches. The CNN branch is used to generate images conditionally from noise. The Transformer branch continuously focuses on the relevant words in natural language descriptions and fuses the sentence and word information to guide the CNN branch for image generation. Our approach can better merge global and local representations to improve the semantic consistency between textual information and synthetic images. Importantly, our Conformer-GAN can generate natural and realistic 512 x 512 images. Extensive experiments on the challenging public benchmark datasets CUB bird and COCO demonstrate that our method outperforms recent state-of-the-art methods both in terms of generated image quality and text-image semantic consistency.
引用
收藏
页码:3 / 14
页数:12
相关论文
共 50 条
  • [31] COMIM-GAN: Improved Text-to-Image Generation via Condition Optimization and Mutual Information Maximization
    Zhou, Longlong
    Wu, Xiao-Jun
    Xu, Tianyang
    MULTIMEDIA MODELING, MMM 2023, PT I, 2023, 13833 : 385 - 396
  • [32] LFR-GAN: Local Feature Refinement based Generative Adversarial Network for Text-to-Image Generation
    Deng, Zijun
    He, Xiangteng
    Peng, Yuxin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (06)
  • [33] FLOW IMAGE GENERATION ALGORITHMS FOR IMPROVING GAN
    Zhang, Xianhong
    Li, Shusen
    JOURNAL OF FLOW VISUALIZATION AND IMAGE PROCESSING, 2021, 28 (01) : 45 - 59
  • [34] Bayesian optimization for conformer generation
    Chan, Lucian
    Hutchison, Geoffrey R.
    Morris, Garrett M.
    JOURNAL OF CHEMINFORMATICS, 2019, 11
  • [35] Image Classification and Generation Based on GAN Model
    Meng, Han
    Guo, Fangru
    2021 3RD INTERNATIONAL CONFERENCE ON MACHINE LEARNING, BIG DATA AND BUSINESS INTELLIGENCE (MLBDBI 2021), 2021, : 180 - 183
  • [36] Development and Classification of Image Dataset for Text-to-Image Generation
    Kumar M.
    Mittal M.
    Singh S.
    Journal of The Institution of Engineers (India): Series B, 2024, 105 (04) : 787 - 796
  • [37] IRC-GAN: Introspective Recurrent Convolutional GAN for Text-to-video Generation
    Deng, Kangle
    Fei, Tianyi
    Huang, Xin
    Peng, Yuxin
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2216 - 2222
  • [38] Prompt Refinement with Image Pivot for Text-to-Image Generation
    Zhan, Jingtao
    Ai, Qingyao
    Liu, Yiqun
    Pan, Yingwei
    Yao, Ting
    Mao, Jiaxin
    Ma, Shaoping
    Mei, Tao
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 941 - 954
  • [39] Conformer generation under restraints
    de Bakker, PI
    Furnham, N
    Blundell, TL
    DePristo, MA
    CURRENT OPINION IN STRUCTURAL BIOLOGY, 2006, 16 (02) : 160 - 165
  • [40] Bayesian optimization for conformer generation
    Lucian Chan
    Geoffrey R. Hutchison
    Garrett M. Morris
    Journal of Cheminformatics, 11