StyleDrop: Text-to-Image Generation in Any Style

被引：0

作者：

Sohn, Kihyuk ^{[1
]}

Ruiz, Nataniel ^{[1
]}

Lee, Kimin ^{[1
]}

Chin, Daniel Castro ^{[1
]}

Blok, Irina ^{[1
]}

Chang, Huiwen ^{[1
]}

Barber, Jarred ^{[1
]}

Jiang, Lu ^{[1
]}

Entis, Glenn ^{[1
]}

Li, Yuanzhen ^{[1
]}

Hao, Yuan ^{[1
]}

Essa, Irfan ^{[1
]}

Rubinstein, Michael ^{[1
]}

Krishnan, Dilip ^{[1
]}

机构：

[1] Google Res, Mountain View, CA 94043 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-trained large text-to-image models synthesize impressive images with an appropriate use of text prompts. However, ambiguities inherent in natural language and out-of-distribution effects make it hard to synthesize image styles, that leverage a specific design pattern, texture or material. In this paper, we introduce StyleDrop, a method that enables the synthesis of images that faithfully follow a specific style using a text-to-image model. The proposed method is extremely versatile and captures nuances and details of a user-provided style, such as color schemes, shading, design patterns, and local and global effects. It efficiently learns a new style by fine-tuning very few trainable parameters (less than 1% of total model parameters) and improving the quality via iterative training with either human or automated feedback. Better yet, StyleDrop is able to deliver impressive results even when the user supplies only a single image that specifies the desired style. An extensive study shows that, for the task of style tuning text-to-image models, StyleDrop implemented on Muse [5] convincingly outperforms other methods, including DreamBooth [34] and textual inversion [11] on Imagen [35] or Stable Diffusion [33]. More results are available at our project website: https://styledrop.github.io.

引用

页数：30

共 50 条

[21] DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models
Ahn, Namhyuk
Lee, Junsoo
Lee, Chunggi
Kim, Kunhee
Kim, Daesik
Nam, Seung-Hun
Hong, Kibeom
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 674 - 681
[22] Improving text-to-image generation with object layout guidance
Jezia Zakraoui
Moutaz Saleh
Somaya Al-Maadeed
Jihad Mohammed Jaam
Multimedia Tools and Applications, 2021, 80 : 27423 - 27443
[23] Variational Distribution Learning for Unsupervised Text-to-Image Generation
Kang, Minsoo
Lee, Doyup
Kim, Jiseob
Kim, Saehoon
Han, Bohyung
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23380 - 23389
[24] HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
Narasimhaswamy, Supreeth
Bhattacharya, Uttaran
Chen, Xiang
Dasgupta, Ishita
Mitra, Saayan
Hoai, Minh
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 2468 - 2479
[25] Attribute-Centric Compositional Text-to-Image Generation
Cong, Yuren
Min, Martin Renqiang
Li, Li Erran
Rosenhahn, Bodo
Yang, Michael Ying
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
[26] Latent Guard: A Safety Framework for Text-to-Image Generation
Liu, Runtao
Khakzar, Ashkan
Gu, Jindong
Chen, Qifeng
Torr, Philip
Pizzati, Fabio
COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 93 - 109
[27] Improving text-to-image generation with object layout guidance
Zakraoui, Jezia
Saleh, Moutaz
Al-Maadeed, Somaya
Jaam, Jihad Mohammed
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (18) : 27423 - 27443
[28] Using text-to-image generation for architectural design ideation
Paananen, Ville
Oppenlaender, Jonas
Visuri, Aku
INTERNATIONAL JOURNAL OF ARCHITECTURAL COMPUTING, 2024, 22 (03) : 458 - 474
[29] CogView: Mastering Text-to-Image Generation via Transformers
Ding, Ming
Yang, Zhuoyi
Hong, Wenyi
Zheng, Wendi
Zhou, Chang
Yin, Da
Lin, Junyang
Zou, Xu
Shao, Zhou
Yang, Hongxia
Tang, Jie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[30] No-reference Quality Assessment of Text-to-Image Generation
Huang, Haitao
Jia, Rongli
Zhang, Yuhong
Xie, Rong
Song, Li
Li, Lin
Feng, Yanan
19TH IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING, BMSB 2024, 2024, : 357 - 362

← 1 2 3 4 5 →