Lightweight dynamic conditional GAN with pyramid attention for text-to-image synthesis

被引:34
|
作者
Gao, Lianli [1 ]
Chen, Daiyuan [1 ]
Zhao, Zhou [2 ]
Shao, Jie [1 ]
Shen, Heng Tao [1 ]
机构
[1] Univ Elect Sci & Technol China, Dept Comp Sci, Chengdu 611731, Peoples R China
[2] Zhejiang Univ, Sch Comp Sci, Hangzhou, Peoples R China
关键词
Text-to-image synthesis; Conditional generative adversarial network (CGAN); Network complexity; Disentanglement process; Entanglement process; Information compensation; Pyramid attentive fusion;
D O I
10.1016/j.patcog.2020.107384
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The text-to-image synthesis task aims to generate photographic images conditioned on semantic text de-scriptions. To ensure the sharpness and fidelity of generated images, this task tends to generate high resolution images (e.g., 128(2) or 256(2) ). However, as the resolution increases, the network parameters and complexity increases dramatically. Recent works introduce network structures with extensive parameters and heavy computations to guarantee the production of high-resolution images. As a result, these models come across problems of the unstable training process and high training cost. To tackle these issues, in this paper, we propose an effective information compensation based approach, namely Lightweight Dynamic Conditional GAN (LD-CGAN). LD-CGAN is a compact and structured single-stream network, and it consists of one generator and two independent discriminators to regularize and generate 64(2) and 128(2) images in one feed-forward process. Specifically, the generator of LD-CGAN is composed of three major components: (1) Conditional Embedding (CE), which is an automatically unsupervised learning process aiming at disentangling integrated semantic attributes in the text space; (2) Conditional Manipulating Modular (CM-M) in Conditional Manipulating Block (CM-B), which is designed to continuously provide the image features with the compensation information (i.e., the disentangled attribute); and (3) Pyramid Attention Refine Block (PAR-B), which is used to enrich multi-scale features by capturing spatial importance between multi-scale context. Consequently, experiments conducted under two benchmark datasets, CUB and Oxford-102, indicate that our generated 128(2) images can achieve comparable performance with 256(2) images generated by the state-of-the-arts on two evaluation metrics: Inception Score (IS) and Visual-semantic Similarity (VS). Compared with the current state-of-the-art HDGAN, our LD-CGAN significantly decreases the number of parameters and computation time by 86.8% and 94.9%, respectively. (c) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] ResFPA-GAN: Text-to-Image Synthesis with Generative Adversarial Network Based on Residual Block Feature Pyramid Attention
    Sun, Jingcong
    Zhou, Yimin
    Zhang, Bin
    2019 IEEE INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND ITS SOCIAL IMPACTS (ARSO), 2019, : 317 - 322
  • [2] Perceptual Pyramid Adversarial Networks for Text-to-Image Synthesis
    Gao, Lianli
    Chen, Daiyuan
    Song, Jingkuan
    Xu, Xing
    Zhang, Dongxiang
    Shen, Heng Tao
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8312 - 8319
  • [3] DAE-GAN: Dynamic Aspect-aware GAN for Text-to-Image Synthesis
    Ruan, Shulan
    Zhang, Yong
    Zhang, Kun
    Fan, Yanbo
    Tang, Fan
    Liu, Qi
    Chen, Enhong
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13940 - 13949
  • [4] Counterfactual GAN for debiased text-to-image synthesis
    Kong, Xianghua
    Xu, Ning
    Sun, Zefang
    Shen, Zhewen
    Zheng, Bolun
    Yan, Chenggang
    Cao, Jinbo
    Kang, Rongbao
    Liu, An-An
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [5] Grounded Text-to-Image Synthesis with Attention Refocusing
    Phung, Quynh
    Ge, Songwei
    Huang, Jia-Bin
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 7932 - 7942
  • [6] DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis
    Zhu, Minfeng
    Pan, Pingbo
    Chen, Wei
    Yang, Yi
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5795 - 5803
  • [7] ARRPNGAN: Text-to-image GAN with attention regularization and region proposal networks
    Quan, Fengnan
    Lang, Bo
    Liu, Yanxi
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 106
  • [8] Neural Architecture Search With a Lightweight Transformer for Text-to-Image Synthesis
    Li, Wei
    We, Shiping
    Shi, Kaibo
    Yang, Yin
    Huang, Tingwen
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2022, 9 (03): : 1567 - 1576
  • [9] Enhanced Text-to-Image Synthesis Conditional Generative Adversarial Networks
    Tan, Yong Xuan
    Lee, Chin Poo
    Neo, Mai
    Lim, Kian Ming
    Lim, Jit Yan
    IAENG International Journal of Computer Science, 2022, 49 (01) : 1 - 7
  • [10] PCCM-GAN: Photographic Text-to-Image Generation with Pyramid Contrastive Consistency Model
    Zhongjian, Q.
    Sun, Jun
    Qian, Jinzhao
    Xu, Jiajia
    Zhan, Shu
    NEUROCOMPUTING, 2021, 449 : 330 - 341