SpaText: Spatio-Textual Representation for Controllable Image Generation

被引:15
|
作者
Avrahami, Omri [1 ,2 ]
Hayes, Thomas [1 ]
Gafni, Oran [1 ]
Gupta, Sonal [1 ]
Taigman, Yaniv [1 ]
Parikh, Devi [1 ]
Lischinski, Dani [2 ]
Fried, Ohad [3 ]
Yin, Xi [1 ]
机构
[1] Meta AI, London, England
[2] Hebrew Univ Jerusalem, Jerusalem, Israel
[3] Reichman Univ, Herzliyya, Israel
关键词
D O I
10.1109/CVPR52729.2023.01762
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent text-to-image diffusion models are able to generate convincing results of unprecedented quality. However, it is nearly impossible to control the shapes of different regions/objects or their layout in a fine-grained fashion. Previous attempts to provide such controls were hindered by their reliance on a fixed set of labels. To this end, we present SpaText - a new method for text-to-image generation using open-vocabulary scene control. In addition to a global text prompt that describes the entire scene, the user provides a segmentation map where each region of interest is annotated by a free-form natural language description. Due to lack of large-scale datasets that have a detailed textual description for each region in the image, we choose to leverage the current large-scale text-to-image datasets and base our approach on a novel CLIP-based spatio-textual representation, and show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-based. In addition, we show how to extend the classifier-free guidance method in diffusion models to the multi-conditional case and present an alternative accelerated inference algorithm. Finally, we offer several automatic evaluation metrics and use them, in addition to FID scores and a user study, to evaluate our method and show that it achieves state-of-the-art results on image generation with free-form textual scene control.
引用
下载
收藏
页码:18370 / 18380
页数:11
相关论文
共 50 条
  • [41] Selecting Representative and Diverse Spatio-Textual Posts over Sliding Windows
    Sacharidis, Dimitris
    Mehta, Paras
    Skoutas, Dimitrios
    Patroumpas, Kostas
    Voisard, Agnes
    30TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT (SSDBM 2018), 2018,
  • [42] Spatio-textual user matching and clustering based on set similarity joins
    Alexandros Belesiotis
    Dimitrios Skoutas
    Christodoulos Efstathiades
    Vassilis Kaffes
    Dieter Pfoser
    The VLDB Journal, 2018, 27 : 297 - 320
  • [43] Selectivity Estimation on Streaming Spatio-Textual Data Using Local Correlations
    Wang, xiaoyang
    Zhang, ying
    Zhang, wenjie
    Lin, xuemin
    Wang, wei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 8 (02): : 101 - 112
  • [44] LIST: learning to index spatio-textual data for embedding based spatial keyword queriesLIST: learning to index spatio-textual data...Z. Yin et al.
    Ziqi Yin
    Shanshan Feng
    Shang Liu
    Gao Cong
    Yew Soon Ong
    Bin Cui
    The VLDB Journal, 2025, 34 (3)
  • [45] LATEST: Learning-Assisted Selectivity Estimation Over Spatio-Textual Streams
    Patil, Mayur
    Magdy, Amr
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1607 - 1618
  • [46] Privacy-Preserving Top-k Spatio-Textual Similarity Join
    Teng, Yiping
    Jiang, Dongyue
    Sun, Mengmeng
    Zhao, Liang
    Xu, Li
    Fan, Chunlong
    2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 718 - 726
  • [47] Controllable image generation based on causal representation learning
    Huang, Shanshan
    Wang, Yuanhao
    Gong, Zhili
    Liao, Jun
    Wang, Shu
    Liu, Li
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2024, 25 (01) : 135 - 148
  • [48] NewsStand CoronaViz: A map query interface for spatio-temporal and spatio-textual monitoring of disease spread
    University of Maryland, United States
    arXiv,
  • [49] Disentangled Representation Learning for Controllable Person Image Generation
    Xu, Wenju
    Long, Chengjiang
    Nie, Yongwei
    Wang, Guanghui
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6065 - 6077
  • [50] An Improved Density-Based Approach to Spatio-Textual Clustering on Social Media
    Minh D Nguyen
    Shin, Won-Yong
    IEEE ACCESS, 2019, 7 : 27217 - 27230