SpaText: Spatio-Textual Representation for Controllable Image Generation

被引:15
|
作者
Avrahami, Omri [1 ,2 ]
Hayes, Thomas [1 ]
Gafni, Oran [1 ]
Gupta, Sonal [1 ]
Taigman, Yaniv [1 ]
Parikh, Devi [1 ]
Lischinski, Dani [2 ]
Fried, Ohad [3 ]
Yin, Xi [1 ]
机构
[1] Meta AI, London, England
[2] Hebrew Univ Jerusalem, Jerusalem, Israel
[3] Reichman Univ, Herzliyya, Israel
关键词
D O I
10.1109/CVPR52729.2023.01762
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent text-to-image diffusion models are able to generate convincing results of unprecedented quality. However, it is nearly impossible to control the shapes of different regions/objects or their layout in a fine-grained fashion. Previous attempts to provide such controls were hindered by their reliance on a fixed set of labels. To this end, we present SpaText - a new method for text-to-image generation using open-vocabulary scene control. In addition to a global text prompt that describes the entire scene, the user provides a segmentation map where each region of interest is annotated by a free-form natural language description. Due to lack of large-scale datasets that have a detailed textual description for each region in the image, we choose to leverage the current large-scale text-to-image datasets and base our approach on a novel CLIP-based spatio-textual representation, and show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-based. In addition, we show how to extend the classifier-free guidance method in diffusion models to the multi-conditional case and present an alternative accelerated inference algorithm. Finally, we offer several automatic evaluation metrics and use them, in addition to FID scores and a user study, to evaluate our method and show that it achieves state-of-the-art results on image generation with free-form textual scene control.
引用
下载
收藏
页码:18370 / 18380
页数:11
相关论文
共 50 条
  • [21] Hybrid-LSH for Spatio-Textual Similarity Queries
    Zhu, Mingdong
    Shen, Derong
    Liu, Ling
    Yu, Ge
    WEB TECHNOLOGIES AND APPLICATIONS (APWEB 2015), 2015, 9313 : 166 - 177
  • [22] Top-k Spatio-Textual Similarity Join
    Hu, Huiqi
    Li, Guoliang
    Bao, Zhifeng
    Feng, Jianhua
    Wu, Yongwei
    Gong, Zhiguo
    Xu, Yaoqiang
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1576 - 1577
  • [23] An Efficient Algorithm for Spatio-Textual Object Cluster Join
    Chen, Mingming
    Wang, Ning
    Zhu, Daxin
    Shang, Jedi S.
    BIG DATA RESEARCH, 2021, 25
  • [24] Efficient Spatio-Textual Similarity Join Processing on NUMA Systems
    Gautam, Saransh
    Ray, Suprio
    Nickerson, Bradford G.
    2021 22ND IEEE INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2021), 2021, : 79 - 88
  • [25] Maximizing Influence of Spatio-Textual Objects Based on Keyword Selection
    Gkorgkas, Orestis
    Vlachou, Akrivi
    Doulkeridis, Christos
    Norvag, Kjetil
    ADVANCES IN SPATIAL AND TEMPORAL DATABASES (SSTD 2015), 2015, 9239 : 413 - 430
  • [26] Parameterized Spatio-Textual Publish/Subscribe in Road Sensor Networks
    Li, Yanhong
    Huang, Ziqing
    Zhu, Rongbo
    Li, Guohui
    Shu, Lihchyun
    Tian, Shasha
    Ma, Maode
    IEEE ACCESS, 2017, 5 : 22940 - 22952
  • [27] Authenticated Spatio-textual Similarity Joins in Untrusted Cloud Environments
    Yan, Han
    Cheng, Xiang
    Su, Sen
    Zhang, Qiying
    Xu, Jianliang
    2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 685 - 694
  • [28] Assessing the Quality of Spatio-Textual Datasets in the Absence of Ground Truth
    Ge, Mouzhi
    Chondrogiannis, Theodoros
    NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2017, 2017, 767 : 12 - 20
  • [29] Distributed Publish/Subscribe Query Processing on the Spatio-Textual Data Stream
    Chen, Zhida
    Cong, Gao
    Zhang, Zhenjie
    Fu, Tom Z. J.
    Chen, Lisi
    2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 1095 - 1106
  • [30] FAST: Frequency-Aware Indexing for Spatio-Textual Data Streams
    Mahmood, Ahmed R.
    Aly, Ahmed M.
    Aref, Walid G.
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 305 - 316