SpaText: Spatio-Textual Representation for Controllable Image Generation

被引:15
|
作者
Avrahami, Omri [1 ,2 ]
Hayes, Thomas [1 ]
Gafni, Oran [1 ]
Gupta, Sonal [1 ]
Taigman, Yaniv [1 ]
Parikh, Devi [1 ]
Lischinski, Dani [2 ]
Fried, Ohad [3 ]
Yin, Xi [1 ]
机构
[1] Meta AI, London, England
[2] Hebrew Univ Jerusalem, Jerusalem, Israel
[3] Reichman Univ, Herzliyya, Israel
关键词
D O I
10.1109/CVPR52729.2023.01762
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent text-to-image diffusion models are able to generate convincing results of unprecedented quality. However, it is nearly impossible to control the shapes of different regions/objects or their layout in a fine-grained fashion. Previous attempts to provide such controls were hindered by their reliance on a fixed set of labels. To this end, we present SpaText - a new method for text-to-image generation using open-vocabulary scene control. In addition to a global text prompt that describes the entire scene, the user provides a segmentation map where each region of interest is annotated by a free-form natural language description. Due to lack of large-scale datasets that have a detailed textual description for each region in the image, we choose to leverage the current large-scale text-to-image datasets and base our approach on a novel CLIP-based spatio-textual representation, and show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-based. In addition, we show how to extend the classifier-free guidance method in diffusion models to the multi-conditional case and present an alternative accelerated inference algorithm. Finally, we offer several automatic evaluation metrics and use them, in addition to FID scores and a user study, to evaluate our method and show that it achieves state-of-the-art results on image generation with free-form textual scene control.
引用
下载
收藏
页码:18370 / 18380
页数:11
相关论文
共 50 条
  • [1] Spatio-Textual Similarity Joins
    Bouros, Panagiotis
    Ge, Shen
    Mamoulis, Nikos
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 6 (01): : 1 - 12
  • [2] Spatio-Textual similarity joins
    Bouros, P. (fpbouros@cs.hku.hk), 1600, Association for Computing Machinery (06):
  • [3] SEAL: Spatio-Textual Similarity Search
    Fan, Ju
    Li, Guoliang
    Zhou, Lizhu
    Chen, Shanshan
    Hu, Jun
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (09): : 824 - 835
  • [4] Spatio-Textual Group Skyline Query
    Sun, Mengmeng
    Teng, Yiping
    Zhao, Fanyou
    Qi, Jiawei
    Jiang, Dongyue
    Fan, Chunlong
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS. DASFAA 2023 INTERNATIONAL WORKSHOPS, BDMS 2023, BDQM 2023, GDMA 2023, BUNDLERS 2023, 2023, 13922 : 34 - 50
  • [5] Continuous Summarization of Streaming Spatio-Textual Posts
    Sacharidis, Dimitris
    Mehta, Paras
    Skoutas, Dimitrios
    Patroumpas, Kostas
    Voisard, Agnes
    25TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2017), 2017,
  • [6] An efficient algorithm for spatio-textual location matching
    Ning Wang
    Jianping Zeng
    Mingming Chen
    Shunzhi Zhu
    Distributed and Parallel Databases, 2020, 38 : 649 - 666
  • [7] Spatio-Textual technology: the future of web search
    Hysenaj, Medjon
    Hoxha, Elira
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON INTERACTIVE MOBILE COMMUNICATION TECHNOLOGIES AND LEARNING (IMCL), 2015, : 210 - 213
  • [8] Spatio-textual indexing for geographical search on the web
    Vaid, S
    Jones, CB
    Joho, H
    Sanderson, M
    ADVANCES IN SPATIAL AND TEMPORAL DATABASES, PROCEEDINGS, 2005, 3633 : 218 - 235
  • [9] Clue-based Spatio-textual Query
    Liu, Junling
    Deng, Ke
    Sun, Huanliang
    Ge, Yu
    Zhou, Xiaofang
    Jensen, Christian S.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (05): : 529 - 540
  • [10] An efficient algorithm for spatio-textual location matching
    Wang, Ning
    Zeng, Jianping
    Chen, Mingming
    Zhu, Shunzhi
    DISTRIBUTED AND PARALLEL DATABASES, 2020, 38 (03) : 649 - 666