Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation

被引：7

作者：

Tan, Zhaorui ^{[1
,2
]}

Yang, Xi ^{[1
]}

Ye, Zihan ^{[1
]}

Wang, Qiufeng ^{[1
]}

Yan, Yuyao ^{[1
]}

Nguyen, Anh ^{[2
]}

Huang, Kaizhu ^{[3
]}

机构：

[1] Xian Jiaotong Liverpool Univ, Suzhou, Jiangsu, Peoples R China

[2] Univ Liverpool, Liverpool, England

[3] Duke Kunshan Univ, Suzhou, Jiangsu, Peoples R China

来源：

PATTERN RECOGNITION | 2023年 / 144卷

关键词：

Text-to-image; Image generation; Generative adversarial networks; Semantic consistency;

D O I：

10.1016/j.patcog.2023.109883

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Generating high-quality images from text remains a challenge in visual-language understanding, with text image consistency being a major concern. Particularly, the most popular metric R-precision may not accurately reflect the text-image consistency, leading to misleading semantics in generated images. Albeit its significance, designing a better text-image consistency metric surprisingly remains under-explored in the community. In this paper, we make a further step forward to develop a novel CLIP-based metric, Semantic Similarity Distance (SSD), which is both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets. We also introduce Parallel Deep Fusion Generative Adversarial Networks (PDF-GAN), which use two novel components to mitigate inconsistent semantics and bridge the text-image semantic gap. A series of experiments indicate that, under the guidance of SSD, our developed PDF-GAN can induce remarkable enhancements in the consistency between texts and images while preserving acceptable image quality over the CUB and COCO datasets.

引用

页数：11

共 50 条

[31] StyleDrop: Text-to-Image Generation in Any Style
Sohn, Kihyuk
Ruiz, Nataniel
Lee, Kimin
Chin, Daniel Castro
Blok, Irina
Chang, Huiwen
Barber, Jarred
Jiang, Lu
Entis, Glenn
Li, Yuanzhen
Hao, Yuan
Essa, Irfan
Rubinstein, Michael
Krishnan, Dilip
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[32] TEXT-IMAGE ARTICULATION IN THE PROCESSING OF AN EXPOSITORY TEXT
GAONACH, D
INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1992, 27 (3-4) : 578 - 578
[33] A taxonomy of prompt modifiers for text-to-image generation
Oppenlaender, Jonas
BEHAVIOUR & INFORMATION TECHNOLOGY, 2024, 43 (15) : 3763 - 3776
[34] Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis
Hong, Seunghoon
Yang, Dingdong
Choi, Jongwook
Lee, Honglak
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7986 - 7994
[35] Semantic Object Accuracy for Generative Text-to-Image Synthesis
Hinz, Tobias
Heinrich, Stefan
Wermter, Stefan
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (03) : 1552 - 1565
[36] Multi-Semantic Fusion Generative Adversarial Network for Text-to-Image Generation
Huang, Pingda
Liu, Yedan
Fu, Chunjiang
Zhao, Liang
2023 IEEE 8TH INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS, ICBDA, 2023, : 159 - 164
[37] Uncovering Limitations in Text-to-Image Generation: A Contrastive Approach with Structured Semantic Alignment
Feng, Qianyu
Sui, Yulei
Zhang, Hongyu
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8876 - 8888
[38] AN EXPERIMENTAL TEXT-IMAGE WORKSTATION
BERGMANN, B
HORAK, W
KUTSCHER, C
POSTL, W
SCHEITERER, E
WOBORSCHIL, W
SIEMENS FORSCHUNGS-UND ENTWICKLUNGSBERICHTE-SIEMENS RESEARCH AND DEVELOPMENT REPORTS, 1983, 12 (01): : 55 - 60
[39] Text-image conditioned diffusion for consistent text-to-3D generation
He, Yuze
Bai, Yushi
Lin, Matthieu
Sheng, Jenny
Hu, Yubin
Wang, Qi
Wen, Yu-Hui
Liu, Yong-Jin
COMPUTER AIDED GEOMETRIC DESIGN, 2024, 111
[40] A new semantic text-image search engine for car designers
Bereciartua, Arantza
Bouchard, Carole
Omhover, Jean-Francois
Ferecatu, Marin
Houissa, Hichem
Gandon, Fabienne
Logerot, Guillaume
2008 INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING, 2008, : 568 - +

← 1 2 3 4 5 →