TT2INet: Text to Photo-realistic Image Synthesis with Transformer as Text Encoder

被引:0
|
作者
Zhu, Jianwei [1 ]
Li, Zhixin [1 ]
Ma, Huifang [2 ]
机构
[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformer; Generative Adversarial Networks (GANs); spectral normalization; self-attention;
D O I
10.1109/IJCNN52387.2021.9534074
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A text-to-image (T2I) generation method is mainly evaluated from two aspects, one is the quality and diversity of the generated images, and the other is the semantic consistency between the generated images and the input sentences. The feature extraction of the text is a very important part. In this paper, we propose a Transformer based Text-to-Image Network (TT2INet). we use the pre-trained Transformer model (ALBERT) to extract the sentence feature vectors and word feature vectors of the input sentences as the basis for the Generative Adversarial Networks (GANs) to generate images. In addition, we also added self-attention mechanism and spectral normalization method to the model. Adding a self-attention mechanism can make the model pay attention to more local features when generating images. Using the spectral normalization method can make the training of GANs more stable. The Inception Scores of our method on Oxford-102, CUB and COCO datasets are 3.90, 4.89 and 26.53, and R-precision scores are 92.55, 87.72 and 92.29, respectively.
引用
收藏
页数:8
相关论文
共 28 条
  • [1] Photo-Realistic Expressive Text to Talking Head Synthesis
    Wan, Vincent
    Anderson, Robert
    Blokland, Art
    Braunschweiler, Norbert
    Chen, Langzhou
    Kolluru, BalaKrishna
    Latorre, Javier
    Maia, Ranniery
    Stenger, Bjoern
    Yanagisawa, Kayoko
    Stylianou, Yannis
    Akamine, Masami
    Gales, Mark J. F.
    Cipolla, Roberto
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2666 - 2668
  • [2] StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
    Zhang, Han
    Xu, Tao
    Li, Hongsheng
    Zhang, Shaoting
    Wang, Xiaogang
    Huang, Xiaolei
    Metaxas, Dimitris
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5908 - 5916
  • [3] Text to photo-realistic image synthesis via chained deep recurrent generative adversarial network
    Wang, Min
    Lang, Congyan
    Feng, Songhe
    Wang, Tao
    Jin, Yi
    Li, Yidong
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 74
  • [4] Text Driven 3D Photo-Realistic Talking Head
    Wang, Lijuan
    Han, Wei
    Soong, Frank K.
    Huo, Qiang
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3314 - 3315
  • [5] Synthesis of Photo-Realistic Facial Animation from Text Based on HMM and DNN with Animation Unit
    Sato, Kazuki
    Nose, Takashi
    Ito, Akinori
    ADVANCES IN INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL 2, 2017, 64 : 29 - 36
  • [6] Photo-realistic Text-driven Malay talking head with multiple expression
    Tan, Tian-Swee
    Salleh, Sh-Hussain
    Chew, Kim-Mey
    Lim, Sheau-Chyi
    2008 INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING, VOLS 1-3, 2008, : 711 - 715
  • [7] Photo-Realistic Facial Details Synthesis From Single Image
    Chen, Anpei
    Chen, Zhang
    Zhang, Guli
    Mitchell, Kenny
    Yu, Jingyi
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9428 - 9438
  • [8] Photo-realistic image synthesis from lines and appearance with modular modulation
    Luo, Wuyang
    Yang, Su
    Zhang, Weishan
    NEUROCOMPUTING, 2022, 503 : 81 - 91
  • [9] Image-based photo hulls for fast and photo-realistic new view synthesis
    Slabaugh, GG
    Schafer, RW
    Hans, MC
    REAL-TIME IMAGING, 2003, 9 (05) : 347 - 360
  • [10] 3DCGiRAM: An intelligent memory architecture for photo-realistic image synthesis
    Kobayashi, H
    Suzuki, K
    Sano, K
    Kaeriyama, Y
    Saida, Y
    Oba, N
    Nakamura, T
    2001 INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD 2001, PROCEEDINGS, 2001, : 462 - 467