Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation

被引:7
|
作者
Tan, Zhaorui [1 ,2 ]
Yang, Xi [1 ]
Ye, Zihan [1 ]
Wang, Qiufeng [1 ]
Yan, Yuyao [1 ]
Nguyen, Anh [2 ]
Huang, Kaizhu [3 ]
机构
[1] Xian Jiaotong Liverpool Univ, Suzhou, Jiangsu, Peoples R China
[2] Univ Liverpool, Liverpool, England
[3] Duke Kunshan Univ, Suzhou, Jiangsu, Peoples R China
关键词
Text-to-image; Image generation; Generative adversarial networks; Semantic consistency;
D O I
10.1016/j.patcog.2023.109883
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generating high-quality images from text remains a challenge in visual-language understanding, with text image consistency being a major concern. Particularly, the most popular metric R-precision may not accurately reflect the text-image consistency, leading to misleading semantics in generated images. Albeit its significance, designing a better text-image consistency metric surprisingly remains under-explored in the community. In this paper, we make a further step forward to develop a novel CLIP-based metric, Semantic Similarity Distance (SSD), which is both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets. We also introduce Parallel Deep Fusion Generative Adversarial Networks (PDF-GAN), which use two novel components to mitigate inconsistent semantics and bridge the text-image semantic gap. A series of experiments indicate that, under the guidance of SSD, our developed PDF-GAN can induce remarkable enhancements in the consistency between texts and images while preserving acceptable image quality over the CUB and COCO datasets.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] StyleDrop: Text-to-Image Generation in Any Style
    Sohn, Kihyuk
    Ruiz, Nataniel
    Lee, Kimin
    Chin, Daniel Castro
    Blok, Irina
    Chang, Huiwen
    Barber, Jarred
    Jiang, Lu
    Entis, Glenn
    Li, Yuanzhen
    Hao, Yuan
    Essa, Irfan
    Rubinstein, Michael
    Krishnan, Dilip
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [32] TEXT-IMAGE ARTICULATION IN THE PROCESSING OF AN EXPOSITORY TEXT
    GAONACH, D
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1992, 27 (3-4) : 578 - 578
  • [33] A taxonomy of prompt modifiers for text-to-image generation
    Oppenlaender, Jonas
    BEHAVIOUR & INFORMATION TECHNOLOGY, 2024, 43 (15) : 3763 - 3776
  • [34] Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis
    Hong, Seunghoon
    Yang, Dingdong
    Choi, Jongwook
    Lee, Honglak
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7986 - 7994
  • [35] Semantic Object Accuracy for Generative Text-to-Image Synthesis
    Hinz, Tobias
    Heinrich, Stefan
    Wermter, Stefan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (03) : 1552 - 1565
  • [36] Multi-Semantic Fusion Generative Adversarial Network for Text-to-Image Generation
    Huang, Pingda
    Liu, Yedan
    Fu, Chunjiang
    Zhao, Liang
    2023 IEEE 8TH INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS, ICBDA, 2023, : 159 - 164
  • [37] Uncovering Limitations in Text-to-Image Generation: A Contrastive Approach with Structured Semantic Alignment
    Feng, Qianyu
    Sui, Yulei
    Zhang, Hongyu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8876 - 8888
  • [38] AN EXPERIMENTAL TEXT-IMAGE WORKSTATION
    BERGMANN, B
    HORAK, W
    KUTSCHER, C
    POSTL, W
    SCHEITERER, E
    WOBORSCHIL, W
    SIEMENS FORSCHUNGS-UND ENTWICKLUNGSBERICHTE-SIEMENS RESEARCH AND DEVELOPMENT REPORTS, 1983, 12 (01): : 55 - 60
  • [39] Text-image conditioned diffusion for consistent text-to-3D generation
    He, Yuze
    Bai, Yushi
    Lin, Matthieu
    Sheng, Jenny
    Hu, Yubin
    Wang, Qi
    Wen, Yu-Hui
    Liu, Yong-Jin
    COMPUTER AIDED GEOMETRIC DESIGN, 2024, 111
  • [40] A new semantic text-image search engine for car designers
    Bereciartua, Arantza
    Bouchard, Carole
    Omhover, Jean-Francois
    Ferecatu, Marin
    Houissa, Hichem
    Gandon, Fabienne
    Logerot, Guillaume
    2008 INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING, 2008, : 568 - +