Dual Adversarial Inference for Text-to-Image Synthesis

被引:19
|
作者
Lao, Qicheng [1 ,2 ]
Havaei, Mohammad [1 ]
Pesaranghader, Ahmad [1 ,3 ]
Dutil, Francis [1 ]
Di Jorio, Lisa [1 ]
Fevens, Thomas [2 ]
机构
[1] Imagia Inc, Montreal, PQ, Canada
[2] Concordia Univ, Montreal, PQ, Canada
[3] Dalhousie Univ, Halifax, NS, Canada
关键词
D O I
10.1109/ICCV.2019.00766
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Synthesizing images from a given text description involves engaging two types of information: the content, which includes information explicitly described in the text (e.g., color, composition, etc.), and the style, which is usually not well described in the text (e.g., location, quantity, size, etc.). However, in previous works, it is typically treated as a process of generating images only from the content, i.e., without considering learning meaningful style representations. In this paper, we aim to learn two variables that are disentangled in the latent space, representing content and style respectively. We achieve this by augmenting current text-to-image synthesis frameworks with a dual adversarial inference mechanism. Through extensive experiments, we show that our model learns, in an unsupervised manner, style representations corresponding to certain meaningful information present in the image that are not well described in the text. The new framework also improves the quality of synthesized images when evaluated on Oxford-102, CUB and COCO datasets.
引用
收藏
页码:7566 / 7575
页数:10
相关论文
共 50 条
  • [41] DE-GAN: Text-to-image synthesis with dual and efficient fusion model
    Jiang, Bin
    Zeng, Weiyuan
    Yang, Chao
    Wang, Renjun
    Zhang, Bolin
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 23839 - 23852
  • [42] DMF-GAN: Deep Multimodal Fusion Generative Adversarial Networks for Text-to-Image Synthesis
    Yang, Bing
    Xiang, Xueqin
    Kong, Wangzeng
    Zhang, Jianhai
    Peng, Yong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6956 - 6967
  • [43] DE-GAN: Text-to-image synthesis with dual and efficient fusion model
    Bin Jiang
    Weiyuan Zeng
    Chao Yang
    Renjun Wang
    Bolin Zhang
    [J]. Multimedia Tools and Applications, 2024, 83 : 23839 - 23852
  • [44] KT-GAN: Knowledge-Transfer Generative Adversarial Network for Text-to-Image Synthesis
    Tan, Hongchen
    Liu, Xiuping
    Liu, Meng
    Yin, Baocai
    Li, Xin
    [J]. IEEE Transactions on Image Processing, 2021, 30 : 1275 - 1290
  • [45] Multi-Sentence Auxiliary Adversarial Networks for Fine-Grained Text-to-Image Synthesis
    Yang, Yanhua
    Wang, Lei
    Xie, De
    Deng, Cheng
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 (30) : 2798 - 2809
  • [46] Text-to-image synthesis with self-supervised bi-stage generative adversarial network
    Tan, Yong Xuan
    Lee, Chin Poo
    Neo, Mai
    Lim, Kian Ming
    Lim, Jit Yan
    [J]. PATTERN RECOGNITION LETTERS, 2023, 169 : 43 - 49
  • [47] KT-GAN: Knowledge-Transfer Generative Adversarial Network for Text-to-Image Synthesis
    Tan, Hongchen
    Liu, Xiuping
    Liu, Meng
    Yin, Baocai
    Li, Xin
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 1275 - 1290
  • [48] Instance Mask Embedding and Attribute-Adaptive Generative Adversarial Network for Text-to-Image Synthesis
    Ni, Jiancheng
    Zhang, Susu
    Zhou, Zili
    Hou, Jie
    Gao, Feng
    [J]. IEEE ACCESS, 2020, 8 (08): : 37697 - 37711
  • [49] Scaling up GANs for Text-to-Image Synthesis
    Kang, Minguk
    Zhu, Jun-Yan
    Zhang, Richard
    Park, Jaesik
    Shechtman, Eli
    Paris, Sylvain
    Park, Taesung
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10124 - 10134
  • [50] Efficient Neural Architecture for Text-to-Image Synthesis
    Souza, Douglas M.
    Wehrmann, Jonatas
    Ruiz, Duncan D.
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,