Adma-GAN: Attribute-Driven Memory Augmented GANs for Text-to-Image Generation.

被引:4
|
作者
Wu, Xintian [1 ]
Zhao, Hanbin [1 ]
Zheng, Liangli [1 ]
Ding, Shouhong [2 ]
Li, Xi [1 ,3 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] Tencent, Youtu Lab, Shanghai, Peoples R China
[3] Shanghai AI Lab, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
text-to-image generation; attribute memory; sample-aware; sample-joint; cross-modal alignment;
D O I
10.1145/3503161.3547821
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
As a challenging task, text-to-image generation aims to generate photo-realistic and semantically consistent images according to the given text descriptions. Existing methods mainly extract the text information from only one sentence to represent an image and the text representation effects the quality of the generated image well. However, directly utilizing the limited information in one sentence misses some key attribute descriptions, which are the crucial factors to describe an image accurately. To alleviate the above problem, we propose an effective text representation method with the complements of attribute information. Firstly, we construct an attribute memory to jointly control the text-to-image generation with sentence input. Secondly, we explore two update mechanisms, sample-aware and sample-joint mechanisms, to dynamically optimize a generalized attribute memory. Furthermore, we design an attribute-sentence-joint conditional generator learning scheme to align the feature embeddings among multiple representations, which promotes the cross-modal network training. Experimental results illustrate that the proposed method obtains substantial performance improvements on both the CUB (FID from 14.81 to 8.57) and COCO (FID from 21.42 to 12.39) datasets.
引用
收藏
页码:1593 / 1602
页数:10
相关论文
共 18 条
  • [1] AMM-GAN: Attribute-Matching Memory for Person Text-to-Image Generation
    Yue, Wei
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 146 - 158
  • [2] DR-GAN: Distribution Regularization for Text-to-Image Generation
    Tan, Hongchen
    Liu, Xiuping
    Yin, Baocai
    Li, Xin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10309 - 10323
  • [3] AraBERT and DF-GAN fusion for Arabic text-to-image generation
    Bahani, Mourad
    El Ouaazizi, Aziza
    Maalmi, Khalil
    Array, 2022, 16
  • [4] AraBERT and DF-GAN fusion for Arabic text-to-image generation
    Bahani, Mourad
    El Ouaazizi, Aziza
    Maalmi, Khalil
    ARRAY, 2022, 16
  • [5] Stacking VAE and GAN for Context-aware Text-to-Image Generation
    Zhang, Chenrui
    Peng, Yuxin
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [6] Subject-driven Text-to-Image Generation via Apprenticeship Learning
    Chen, Wenhu
    Hu, Hexiang
    Li, Yandong
    Ruiz, Nataniel
    Jia, Xuhui
    Chang, Ming-Wei
    Cohen, William W.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis
    Zhu, Minfeng
    Pan, Pingbo
    Chen, Wei
    Yang, Yi
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5795 - 5803
  • [8] PCCM-GAN: Photographic Text-to-Image Generation with Pyramid Contrastive Consistency Model
    Zhongjian, Q.
    Sun, Jun
    Qian, Jinzhao
    Xu, Jiajia
    Zhan, Shu
    NEUROCOMPUTING, 2021, 449 : 330 - 341
  • [9] SAW-GAN: Multi-granularity Text Fusion Generative Adversarial Networks for text-to-image generation
    Jin, Dehu
    Yu, Qi
    Yu, Lan
    Qi, Meng
    KNOWLEDGE-BASED SYSTEMS, 2024, 294
  • [10] DAC-GAN: Dual Auxiliary Consistency Generative Adversarial Network for Text-to-Image Generation
    Wang, Zhiwei
    Yang, Jing
    Cui, Jiajun
    Liu, Jiawei
    Wang, Jiahao
    COMPUTER VISION - ACCV 2022, PT VII, 2023, 13847 : 3 - 19