Lightweight dynamic conditional GAN with pyramid attention for text-to-image synthesis

被引:34
|
作者
Gao, Lianli [1 ]
Chen, Daiyuan [1 ]
Zhao, Zhou [2 ]
Shao, Jie [1 ]
Shen, Heng Tao [1 ]
机构
[1] Univ Elect Sci & Technol China, Dept Comp Sci, Chengdu 611731, Peoples R China
[2] Zhejiang Univ, Sch Comp Sci, Hangzhou, Peoples R China
关键词
Text-to-image synthesis; Conditional generative adversarial network (CGAN); Network complexity; Disentanglement process; Entanglement process; Information compensation; Pyramid attentive fusion;
D O I
10.1016/j.patcog.2020.107384
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The text-to-image synthesis task aims to generate photographic images conditioned on semantic text de-scriptions. To ensure the sharpness and fidelity of generated images, this task tends to generate high resolution images (e.g., 128(2) or 256(2) ). However, as the resolution increases, the network parameters and complexity increases dramatically. Recent works introduce network structures with extensive parameters and heavy computations to guarantee the production of high-resolution images. As a result, these models come across problems of the unstable training process and high training cost. To tackle these issues, in this paper, we propose an effective information compensation based approach, namely Lightweight Dynamic Conditional GAN (LD-CGAN). LD-CGAN is a compact and structured single-stream network, and it consists of one generator and two independent discriminators to regularize and generate 64(2) and 128(2) images in one feed-forward process. Specifically, the generator of LD-CGAN is composed of three major components: (1) Conditional Embedding (CE), which is an automatically unsupervised learning process aiming at disentangling integrated semantic attributes in the text space; (2) Conditional Manipulating Modular (CM-M) in Conditional Manipulating Block (CM-B), which is designed to continuously provide the image features with the compensation information (i.e., the disentangled attribute); and (3) Pyramid Attention Refine Block (PAR-B), which is used to enrich multi-scale features by capturing spatial importance between multi-scale context. Consequently, experiments conducted under two benchmark datasets, CUB and Oxford-102, indicate that our generated 128(2) images can achieve comparable performance with 256(2) images generated by the state-of-the-arts on two evaluation metrics: Inception Score (IS) and Visual-semantic Similarity (VS). Compared with the current state-of-the-art HDGAN, our LD-CGAN significantly decreases the number of parameters and computation time by 86.8% and 94.9%, respectively. (c) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models
    Zhang, Yang
    Tzun, Teoh Tze
    Hern, Lim Wei
    Kawaguchi, Kenji
    COMPUTER VISION - ECCV 2024, PT LXXXVI, 2025, 15144 : 70 - 86
  • [32] Mobile App for Text-to-Image Synthesis
    Kang, Ryan
    Sunil, Athira
    Chen, Min
    MOBILE COMPUTING, APPLICATIONS, AND SERVICES, MOBICASE 2019, 2019, 290 : 32 - 43
  • [33] Adversarial text-to-image synthesis: A review
    Frolov, Stanislav
    Hinz, Tobias
    Raue, Federico
    Hees, Joern
    Dengel, Andreas
    NEURAL NETWORKS, 2021, 144 : 187 - 209
  • [34] Optimizing and interpreting the latent space of the conditional text-to-image GANs
    Zhenxing Zhang
    Lambert Schomaker
    Neural Computing and Applications, 2024, 36 : 2549 - 2572
  • [35] GMF-GAN: Gradual multi-granularity semantic fusion GAN for text-to-image synthesis
    Jin, Dehu
    Li, Guangju
    Yu, Qi
    Yu, Lan
    Cui, Jia
    Qi, Meng
    DIGITAL SIGNAL PROCESSING, 2023, 140
  • [36] SAM-GAN: Self-Attention supporting Multi-stage Generative Adversarial Networks for text-to-image synthesis
    Peng, Dunlu
    Yang, Wuchen
    Liu, Cong
    Lu, Shuairui
    NEURAL NETWORKS, 2021, 138 : 57 - 67
  • [37] Optimizing and interpreting the latent space of the conditional text-to-image GANs
    Zhang, Zhenxing
    Schomaker, Lambert
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (05): : 2549 - 2572
  • [38] SF-GAN: Semantic fusion generative adversarial networks for text-to-image synthesis
    Yang, Bing
    Xiang, Xueqin
    Kong, Wanzeng
    Zhang, Jianhai
    Yao, Jinliang
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 262
  • [39] Language-vision matching for text-to-image synthesis with context-aware GAN
    Hou, Yingli
    Zhang, Wei
    Zhu, Zhiliang
    Yu, Hai
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [40] Optimized GAN for Text-to-Image Synthesis: Hybrid Whale Optimization Algorithm and Dragonfly Algorithm
    Talasila, Vamsidhar
    Narasingarao, M. R.
    Mohan, V. Murali
    ADVANCES IN ENGINEERING SOFTWARE, 2022, 173