Lightweight dynamic conditional GAN with pyramid attention for text-to-image synthesis

被引:34
|
作者
Gao, Lianli [1 ]
Chen, Daiyuan [1 ]
Zhao, Zhou [2 ]
Shao, Jie [1 ]
Shen, Heng Tao [1 ]
机构
[1] Univ Elect Sci & Technol China, Dept Comp Sci, Chengdu 611731, Peoples R China
[2] Zhejiang Univ, Sch Comp Sci, Hangzhou, Peoples R China
关键词
Text-to-image synthesis; Conditional generative adversarial network (CGAN); Network complexity; Disentanglement process; Entanglement process; Information compensation; Pyramid attentive fusion;
D O I
10.1016/j.patcog.2020.107384
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The text-to-image synthesis task aims to generate photographic images conditioned on semantic text de-scriptions. To ensure the sharpness and fidelity of generated images, this task tends to generate high resolution images (e.g., 128(2) or 256(2) ). However, as the resolution increases, the network parameters and complexity increases dramatically. Recent works introduce network structures with extensive parameters and heavy computations to guarantee the production of high-resolution images. As a result, these models come across problems of the unstable training process and high training cost. To tackle these issues, in this paper, we propose an effective information compensation based approach, namely Lightweight Dynamic Conditional GAN (LD-CGAN). LD-CGAN is a compact and structured single-stream network, and it consists of one generator and two independent discriminators to regularize and generate 64(2) and 128(2) images in one feed-forward process. Specifically, the generator of LD-CGAN is composed of three major components: (1) Conditional Embedding (CE), which is an automatically unsupervised learning process aiming at disentangling integrated semantic attributes in the text space; (2) Conditional Manipulating Modular (CM-M) in Conditional Manipulating Block (CM-B), which is designed to continuously provide the image features with the compensation information (i.e., the disentangled attribute); and (3) Pyramid Attention Refine Block (PAR-B), which is used to enrich multi-scale features by capturing spatial importance between multi-scale context. Consequently, experiments conducted under two benchmark datasets, CUB and Oxford-102, indicate that our generated 128(2) images can achieve comparable performance with 256(2) images generated by the state-of-the-arts on two evaluation metrics: Inception Score (IS) and Visual-semantic Similarity (VS). Compared with the current state-of-the-art HDGAN, our LD-CGAN significantly decreases the number of parameters and computation time by 86.8% and 94.9%, respectively. (c) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Dual conditional GAN based on external attention for semantic image synthesis
    Liu, Gang
    Zhou, Qijun
    Xie, Xiaoxiao
    Yu, Qingchen
    CONNECTION SCIENCE, 2023, 35 (01)
  • [42] DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation
    Huang, Mengqi
    Mao, Zhendong
    Wang, Penghui
    Wang, Quan
    Zhang, Yongdong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4345 - 4354
  • [43] A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis
    Agarwal, Aishwarya
    Karanam, Srikrishna
    Joseph, K. J.
    Saxena, Apoorv
    Goswami, Koustava
    Srinivasan, Balaji Vasan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2283 - 2293
  • [44] Scaling up GANs for Text-to-Image Synthesis
    Kang, Minguk
    Zhu, Jun-Yan
    Zhang, Richard
    Park, Jaesik
    Shechtman, Eli
    Paris, Sylvain
    Park, Taesung
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10124 - 10134
  • [45] Efficient Neural Architecture for Text-to-Image Synthesis
    Souza, Douglas M.
    Wehrmann, Jonatas
    Ruiz, Duncan D.
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [46] DR-GAN: Distribution Regularization for Text-to-Image Generation
    Tan, Hongchen
    Liu, Xiuping
    Yin, Baocai
    Li, Xin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10309 - 10323
  • [47] CcGL-GAN: Criss-Cross Attention and Global-Local Discriminator Generative Adversarial Networks for text-to-image synthesis
    Ye, Xihong
    Lu, Lu
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [48] SCENE RETRIEVAL FOR VIDEO SUMMARIZATION BASED ON TEXT-TO-IMAGE GAN
    Yanagi, Rintaro
    Togo, Ren
    Ogawa, Takahiro
    Haseyama, Miki
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1825 - 1829
  • [49] ISF-GAN: Imagine, Select, and Fuse with GPT-based Text Enrichment for Text-to-image Synthesis
    Sheng, Yefei
    Tao, Ming
    Wang, Jie
    Bao, Bing-Kun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (07) : 1 - 17
  • [50] Joint Embedding based Text-to-Image Synthesis
    Wang, Menglan
    Yu, Yue
    Li, Benyuan
    2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 432 - 436