Textual-Visual Logic Challenge: Understanding and Reasoning in Text-to-Image Generation

被引:0
|
作者
Xiong, Peixi [1 ]
Kozuch, Michael [1 ]
Jain, Nilesh [1 ]
机构
[1] Intel Labs, Portland, OR 97229 USA
来源
COMPUTER VISION - ECCV 2024, PT V | 2025年 / 15063卷
关键词
Text-to-Image Generation; Structural Reasoning; Relational Understanding;
D O I
10.1007/978-3-031-72652-1_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-image generation plays a pivotal role in computer vision and natural language processing by translating textual descriptions into visual representations. However, understanding complex relations in detailed text prompts filled with rich relational content remains a significant challenge. To address this, we introduce a novel task: Logic-Rich Text-to-Image generation. Unlike conventional image generation tasks that rely on short and structurally simple natural language inputs, our task focuses on intricate text inputs abundant in relational information. To tackle these complexities, we collect the Textual-Visual Logic dataset, designed to evaluate the performance of text-to-image generation models across diverse and complex scenarios. Furthermore, we propose a baseline model as a benchmark for this task. Our model comprises three key components: a relation understanding module, a multi-modality fusion module, and a negative pair discriminator. These components enhance the model's ability to handle disturbances in informative tokens and prioritize relational elements during image generation https://github.com/IntelLabs/Textual-Visual-Logic-Challenge.
引用
收藏
页码:318 / 334
页数:17
相关论文
共 50 条
  • [21] StyleDrop: Text-to-Image Generation in Any Style
    Sohn, Kihyuk
    Ruiz, Nataniel
    Lee, Kimin
    Chin, Daniel Castro
    Blok, Irina
    Chang, Huiwen
    Barber, Jarred
    Jiang, Lu
    Entis, Glenn
    Li, Yuanzhen
    Hao, Yuan
    Essa, Irfan
    Rubinstein, Michael
    Krishnan, Dilip
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [22] DALL-EVAL: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
    Cho, Jaemin
    Zala, Abhay
    Bansal, Mohit
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3020 - 3031
  • [23] A taxonomy of prompt modifiers for text-to-image generation
    Oppenlaender, Jonas
    BEHAVIOUR & INFORMATION TECHNOLOGY, 2024, 43 (15) : 3763 - 3776
  • [24] Hybrid textual-visual relevance learning for content-based image retrieval
    Cui, Chaoran
    Lin, Peiguang
    Nie, Xiushan
    Yin, Yilong
    Zhu, Qingfeng
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2017, 48 : 367 - 374
  • [25] Text-to-Image Generation Method Based on Image-Text Semantic Consistency
    Xue Z.
    Xu Z.
    Lang C.
    Feng S.
    Wang T.
    Li Y.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2180 - 2190
  • [26] Large-scale Text-to-Image Generation Models for Visual Artists' Creative Works
    Ko, Hyung-Kwon
    Park, Gwanmo
    Jeon, Hyeon
    Jo, Jaemin
    Kim, Juho
    Seo, Jinwook
    PROCEEDINGS OF 2023 28TH ANNUAL CONFERENCE ON INTELLIGENT USER INTERFACES, IUI 2023, 2023, : 919 - 933
  • [27] Locally controllable network based on visual–linguistic relation alignment for text-to-image generation
    Zaike Li
    Li Liu
    Huaxiang Zhang
    Dongmei Liu
    Yu Song
    Boqun Li
    Multimedia Systems, 2024, 30
  • [28] Generative adversarial text-to-image generation with style image constraint
    Zekang Wang
    Li Liu
    Huaxiang Zhang
    Dongmei Liu
    Yu Song
    Multimedia Systems, 2023, 29 : 3291 - 3303
  • [29] Generative adversarial text-to-image generation with style image constraint
    Wang, Zekang
    Liu, Li
    Zhang, Huaxiang
    Liu, Dongmei
    Song, Yu
    MULTIMEDIA SYSTEMS, 2023, 29 (06) : 3291 - 3303
  • [30] Unleashing Text-to-Image Diffusion Models for Visual Perception
    Zhao, Wenliang
    Rao, Yongming
    Liu, Zuyan
    Liu, Benlin
    Zhou, Jie
    Lu, Jiwen
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5706 - 5716