SpaText: Spatio-Textual Representation for Controllable Image Generation

被引:15
|
作者
Avrahami, Omri [1 ,2 ]
Hayes, Thomas [1 ]
Gafni, Oran [1 ]
Gupta, Sonal [1 ]
Taigman, Yaniv [1 ]
Parikh, Devi [1 ]
Lischinski, Dani [2 ]
Fried, Ohad [3 ]
Yin, Xi [1 ]
机构
[1] Meta AI, London, England
[2] Hebrew Univ Jerusalem, Jerusalem, Israel
[3] Reichman Univ, Herzliyya, Israel
关键词
D O I
10.1109/CVPR52729.2023.01762
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent text-to-image diffusion models are able to generate convincing results of unprecedented quality. However, it is nearly impossible to control the shapes of different regions/objects or their layout in a fine-grained fashion. Previous attempts to provide such controls were hindered by their reliance on a fixed set of labels. To this end, we present SpaText - a new method for text-to-image generation using open-vocabulary scene control. In addition to a global text prompt that describes the entire scene, the user provides a segmentation map where each region of interest is annotated by a free-form natural language description. Due to lack of large-scale datasets that have a detailed textual description for each region in the image, we choose to leverage the current large-scale text-to-image datasets and base our approach on a novel CLIP-based spatio-textual representation, and show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-based. In addition, we show how to extend the classifier-free guidance method in diffusion models to the multi-conditional case and present an alternative accelerated inference algorithm. Finally, we offer several automatic evaluation metrics and use them, in addition to FID scores and a user study, to evaluate our method and show that it achieves state-of-the-art results on image generation with free-form textual scene control.
引用
下载
收藏
页码:18370 / 18380
页数:11
相关论文
共 50 条
  • [31] Privacy-preserving Spatio-Textual Skylines Based on Location Aggregation
    Liu, Xiaoting
    Teng, Yiping
    Ding, Guohui
    Fan, Chunlong
    2020 5TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (ICCIA 2020), 2020, : 234 - 238
  • [32] Bulk-Loading an Index for Temporally Overlaying Spatio-Textual Trajectories
    Beckemeyer, Michael
    Vahrenhold, Jan
    BIGSPATIAL 2017: PROCEEDINGS OF THE 6TH ACM SIGSPATIAL INTERNATIONAL WORKSHOP ON ANALYTICS FOR BIG GEOSPATIAL DATA (BIGSPATIAL-2017), 2017, : 1 - 10
  • [33] MFPMiner: Mining Meaningful Frequent Patterns from Spatio-textual Trajectories
    Valdes, Fabio
    ACM TRANSACTIONS ON SPATIAL ALGORITHMS AND SYSTEMS, 2022, 8 (01)
  • [34] Measuring spatio-textual affinities in twitter between two urban metropolises
    Minda Hu
    Mayank Kejriwal
    Journal of Computational Social Science, 2022, 5 : 227 - 252
  • [35] An Efficient Block Index Scheme with Segmentation for Spatio-Textual Similarity Join
    Xiang, Yiming
    Zhuang, Yi
    Jiang, Nan
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2017, 11 (07): : 3578 - 3593
  • [36] Measuring spatio-textual affinities in twitter between two urban metropolises
    Hu, Minda
    Kejriwal, Mayank
    JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2022, 5 (01): : 227 - 252
  • [37] Preference-Aware Top-k Spatio-Textual Queries
    Gao, Yunpeng
    Wang, Yao
    Yi, Shengwei
    WEB-AGE INFORMATION MANAGEMENT, 2016, 9998 : 186 - 197
  • [38] Spatio-textual user matching and clustering based on set similarity joins
    Belesiotis, Alexandros
    Skoutas, Dimitrios
    Efstathiades, Christodoulos
    Kaffes, Vassilis
    Pfoser, Dieter
    VLDB JOURNAL, 2018, 27 (03): : 297 - 320
  • [39] Authentication of spatio-textual similarity join queries in untrusted cloud environments
    Yan, Han
    Cheng, Xiang
    Wang, Dezheng
    Su, Sen
    Zhang, Qiying
    SECURITY AND COMMUNICATION NETWORKS, 2016, 9 (18) : 5518 - 5532
  • [40] A Prefix-Filter based Method for Spatio-Textual Similarity Join
    Liu, Sitong
    Li, Guoliang
    Feng, Jianhua
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (10) : 2354 - 2367