Textual-Visual Logic Challenge: Understanding and Reasoning in Text-to-Image Generation

被引：0

作者：

Xiong, Peixi ^{[1
]}

Kozuch, Michael ^{[1
]}

Jain, Nilesh ^{[1
]}

机构：

[1] Intel Labs, Portland, OR 97229 USA

来源：

COMPUTER VISION - ECCV 2024, PT V | 2025年 / 15063卷

关键词：

Text-to-Image Generation; Structural Reasoning; Relational Understanding;

D O I：

10.1007/978-3-031-72652-1_19

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text-to-image generation plays a pivotal role in computer vision and natural language processing by translating textual descriptions into visual representations. However, understanding complex relations in detailed text prompts filled with rich relational content remains a significant challenge. To address this, we introduce a novel task: Logic-Rich Text-to-Image generation. Unlike conventional image generation tasks that rely on short and structurally simple natural language inputs, our task focuses on intricate text inputs abundant in relational information. To tackle these complexities, we collect the Textual-Visual Logic dataset, designed to evaluate the performance of text-to-image generation models across diverse and complex scenarios. Furthermore, we propose a baseline model as a benchmark for this task. Our model comprises three key components: a relation understanding module, a multi-modality fusion module, and a negative pair discriminator. These components enhance the model's ability to handle disturbances in informative tokens and prioritize relational elements during image generation https://github.com/IntelLabs/Textual-Visual-Logic-Challenge.

引用

页码：318 / 334

页数：17

共 50 条

[1] ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation
Wei, Yuxiang
Zhang, Yabo
Ji, Zhilong
Bai, Jinfeng
Zhang, Lei
Zuo, Wangmeng
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15897 - 15907
[2] Visual Programming for Text-to-Image Generation and Evaluation
Cho, Jaemin
Zala, Abhay
Bansal, Mohit
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[3] Controllable Text-to-Image Generation
Li, Bowen
Qi, Xiaojuan
Lukasiewicz, Thomas
Torr, Philip H. S.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[4] Surgical text-to-image generation
Nwoye, Chinedu Innocent
Bose, Rupak
Elgohary, Kareem
Arboit, Lorenzo
Carlino, Giorgio
Lavanchy, Joel L.
Mascagni, Pietro
Padoy, Nicolas
PATTERN RECOGNITION LETTERS, 2025, 190 : 73 - 80
[5] Expressive Text-to-Image Generation with Rich Text
Ge, Songwei
Park, Taesung
Zhu, Jun-Yan
Huang, Jia-Bin
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7511 - 7522
[6] Localization and Manipulation of Immoral Visual Cues for Safe Text-to-Image Generation
Park, Seongbeom
Moon, Suhong
Park, Seunghyun
Kim, Jinkyu
2024 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, WACV 2024, 2024, : 4663 - 4672
[7] Visual question answering based evaluation metrics for text-to-image generation
Miyamoto, Mizuki
Morita, Ryugo
Zhou, Jinjia
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[8] SEMANTICALLY INVARIANT TEXT-TO-IMAGE GENERATION
Sah, Shagan
Peri, Dheeraj
Shringi, Ameya
Zhang, Chi
Dominguez, Miguel
Savakis, Andreas
Ptucha, Ray
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 3783 - 3787
[9] Shifted Diffusion for Text-to-image Generation
Zhou, Yufan
Liu, Bingchen
Zhu, Yizhe
Yang, Xiao
Chen, Changyou
Xu, Jinhui
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10157 - 10166
[10] Text-to-Image Generation for Abstract Concepts
Liao, Jiayi
Chen, Xu
Fu, Qiang
Du, Lun
He, Xiangnan
Wang, Xiang
Han, Shi
Zhang, Dongmei
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3360 - 3368

← 1 2 3 4 5 →