Improving text-to-image generation with object layout guidance

被引：9

作者：

Zakraoui, Jezia ^{[1
]}

Saleh, Moutaz ^{[1
]}

Al-Maadeed, Somaya ^{[1
]}

Jaam, Jihad Mohammed ^{[1
]}

机构：

[1] Qatar Univ, Dept Comp Sci & Engn, Doha 2713, Qatar

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2021年 / 80卷 / 18期

关键词：

Image generation; Text processing; Scene graph; Object layout; Conditioning augmentation; StackGAN;

D O I：

10.1007/s11042-021-11038-0

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The automatic generation of realistic images directly from a story text is a very challenging problem, as it cannot be addressed using a single image generation approach due mainly to the semantic complexity of the story text constituents. In this work, we propose a new approach that decomposes the task of story visualization into three phases: semantic text understanding, object layout prediction, and image generation and refinement. We start by simplifying the text using a scene graph triple notation that encodes semantic relationships between the story objects. We then introduce an object layout module to capture the features of these objects from the corresponding scene graph. Specifically, the object layout module aggregates individual object features from the scene graph as well as averaged or likelihood object features generated by a graph convolutional neural network. All these features are concatenated to form semantic triples that are then provided to the image generation framework. For the image generation phase, we adopt a scene graph image generation framework as stage-I, which is refined using a StackGAN as stage-II conditioned on the object layout module and the generated output image from stage-I. Our approach renders object details in high-resolution images while keeping the image structure consistent with the input text. To evaluate the performance of our approach, we use the COCO dataset and compare it with three baseline approaches, namely, sg2im, StackGAN and AttnGAN, in terms of image quality and user evaluation. According to the obtained assessment results, our object layout guidance-based approach significantly outperforms the abovementioned baseline approaches in terms of the accuracy of semantic matching and realism of the generated images representing the story text sentences.

引用

页码：27423 / 27443

页数：21

共 50 条

[21] Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
Gong, Biao
Huang, Siteng
Feng, Yutong
Zhang, Shiwei
Li, Yuyuan
Liu, Yu
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6624 - 6634
[22] Zero-Shot Text-to-Image Generation
Ramesh, Aditya
Pavlov, Mikhail
Goh, Gabriel
Gray, Scott
Voss, Chelsea
Radford, Alec
Chen, Mark
Sutskever, Ilya
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[23] Dense Text-to-Image Generation with Attention Modulation
Kim, Yunji
Lee, Jiyoung
Kim, Jin-Hwa
Ha, Jung-Woo
Zhu, Jun-Yan
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7667 - 7677
[24] Visual Programming for Text-to-Image Generation and Evaluation
Cho, Jaemin
Zala, Abhay
Bansal, Mohit
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[25] MirrorGAN: Learning Text-to-image Generation by Redescription
Qiao, Tingting
Zhang, Jing
Xu, Duanqing
Tao, Dacheng
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1505 - 1514
[26] StyleDrop: Text-to-Image Generation in Any Style
Sohn, Kihyuk
Ruiz, Nataniel
Lee, Kimin
Chin, Daniel Castro
Blok, Irina
Chang, Huiwen
Barber, Jarred
Jiang, Lu
Entis, Glenn
Li, Yuanzhen
Hao, Yuan
Essa, Irfan
Rubinstein, Michael
Krishnan, Dilip
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[27] Semantic Object Accuracy for Generative Text-to-Image Synthesis
Hinz, Tobias
Heinrich, Stefan
Wermter, Stefan
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (03) : 1552 - 1565
[28] A taxonomy of prompt modifiers for text-to-image generation
Oppenlaender, Jonas
BEHAVIOUR & INFORMATION TECHNOLOGY, 2024, 43 (15) : 3763 - 3776
[29] Text-to-Image Generation Method Based on Image-Text Semantic Consistency
Xue Z.
Xu Z.
Lang C.
Feng S.
Wang T.
Li Y.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2180 - 2190
[30] Generative adversarial text-to-image generation with style image constraint
Zekang Wang
Li Liu
Huaxiang Zhang
Dongmei Liu
Yu Song
Multimedia Systems, 2023, 29 : 3291 - 3303

← 1 2 3 4 5 →