Textual-Visual Logic Challenge: Understanding and Reasoning in Text-to-Image Generation

被引:0
|
作者
Xiong, Peixi [1 ]
Kozuch, Michael [1 ]
Jain, Nilesh [1 ]
机构
[1] Intel Labs, Portland, OR 97229 USA
来源
关键词
Text-to-Image Generation; Structural Reasoning; Relational Understanding;
D O I
10.1007/978-3-031-72652-1_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-image generation plays a pivotal role in computer vision and natural language processing by translating textual descriptions into visual representations. However, understanding complex relations in detailed text prompts filled with rich relational content remains a significant challenge. To address this, we introduce a novel task: Logic-Rich Text-to-Image generation. Unlike conventional image generation tasks that rely on short and structurally simple natural language inputs, our task focuses on intricate text inputs abundant in relational information. To tackle these complexities, we collect the Textual-Visual Logic dataset, designed to evaluate the performance of text-to-image generation models across diverse and complex scenarios. Furthermore, we propose a baseline model as a benchmark for this task. Our model comprises three key components: a relation understanding module, a multi-modality fusion module, and a negative pair discriminator. These components enhance the model's ability to handle disturbances in informative tokens and prioritize relational elements during image generation https://github.com/IntelLabs/Textual-Visual-Logic-Challenge.
引用
收藏
页码:318 / 334
页数:17
相关论文
共 50 条
  • [1] ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation
    Wei, Yuxiang
    Zhang, Yabo
    Ji, Zhilong
    Bai, Jinfeng
    Zhang, Lei
    Zuo, Wangmeng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15897 - 15907
  • [2] Visual Programming for Text-to-Image Generation and Evaluation
    Cho, Jaemin
    Zala, Abhay
    Bansal, Mohit
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Controllable Text-to-Image Generation
    Li, Bowen
    Qi, Xiaojuan
    Lukasiewicz, Thomas
    Torr, Philip H. S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [4] Surgical text-to-image generation
    Nwoye, Chinedu Innocent
    Bose, Rupak
    Elgohary, Kareem
    Arboit, Lorenzo
    Carlino, Giorgio
    Lavanchy, Joel L.
    Mascagni, Pietro
    Padoy, Nicolas
    PATTERN RECOGNITION LETTERS, 2025, 190 : 73 - 80
  • [5] Expressive Text-to-Image Generation with Rich Text
    Ge, Songwei
    Park, Taesung
    Zhu, Jun-Yan
    Huang, Jia-Bin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7511 - 7522
  • [6] Localization and Manipulation of Immoral Visual Cues for Safe Text-to-Image Generation
    Park, Seongbeom
    Moon, Suhong
    Park, Seunghyun
    Kim, Jinkyu
    2024 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, WACV 2024, 2024, : 4663 - 4672
  • [7] Visual question answering based evaluation metrics for text-to-image generation
    Miyamoto, Mizuki
    Morita, Ryugo
    Zhou, Jinjia
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [8] SEMANTICALLY INVARIANT TEXT-TO-IMAGE GENERATION
    Sah, Shagan
    Peri, Dheeraj
    Shringi, Ameya
    Zhang, Chi
    Dominguez, Miguel
    Savakis, Andreas
    Ptucha, Ray
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 3783 - 3787
  • [9] Shifted Diffusion for Text-to-image Generation
    Zhou, Yufan
    Liu, Bingchen
    Zhu, Yizhe
    Yang, Xiao
    Chen, Changyou
    Xu, Jinhui
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10157 - 10166
  • [10] Text-to-Image Generation for Abstract Concepts
    Liao, Jiayi
    Chen, Xu
    Fu, Qiang
    Du, Lun
    He, Xiangnan
    Wang, Xiang
    Han, Shi
    Zhang, Dongmei
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3360 - 3368