Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

被引：0

作者：

Kwon, Gihyun ^{[1
,2
]}

Jenni, Simon ^{[2
]}

Li, Dingzeyu ^{[2
]}

Lee, Joon-Young ^{[2
]}

Ye, Jong Chul ^{[1
]}

Heilbron, Fabian Caba ^{[2
]}

机构：

[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea

[2] Adobe, San Jose, CA 95110 USA

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024 | 2024年

关键词：

D O I：

10.1109/CVPR52733.2024.00848

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging. In this work, we introduce Concept Weaver, a method for composing customized text-to-image diffusion models at inference time. Specifically, the method breaks the process into two steps: creating a template image aligned with the semantics of input prompts, and then personalizing the template using a concept fusion strategy. The fusion strategy incorporates the appearance of the target concepts into the template image while retaining its structural details. The results indicate that our method can generate multiple custom concepts with higher identity fidelity compared to alternative approaches. Furthermore, the method is shown to seamlessly handle more than two concepts and closely follow the semantic meaning of the input prompt without blending appearances across different subjects.

引用

页码：8880 / 8889

页数：10

共 50 条

[41] Multi-concept multi-modality active learning for interactive video annotation
Wang, Meng
Hua, Xian-Sheng
Song, Yan
Tang, Jinhui
Dai, Li-Rong
ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 321 - +
[42] INTERACTIVE VIDEO ANNOTATION BY MULTI-CONCEPT MULTI-MODALITY ACTIVE LEARNING
Wang, Meng
Hua, Xian-Sheng
Mei, Tao
Tang, Jinhui
Qi, Guo-Jun
Song, Yan
Dai, Li-Rong
INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2007, 1 (04) : 459 - 477
[43] SINE: SINgle Image Editing with Text-to-Image Diffusion Models
Zhang, Zhixing
Han, Ligong
Ghosh, Arnab
Metaxas, Dimitris
Ren, Jian
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6027 - 6037
[44] A benchmark test suite for evolutionary multi-objective multi-concept optimization
Niloy, Rounak Saha
Singh, Hemant Kumar
Ray, Tapabrata
SWARM AND EVOLUTIONARY COMPUTATION, 2024, 84
[45] SAW-GAN: Multi-granularity Text Fusion Generative Adversarial Networks for text-to-image generation
Jin, Dehu
Yu, Qi
Yu, Lan
Qi, Meng
KNOWLEDGE-BASED SYSTEMS, 2024, 294
[46] InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
Hoe, Jiun Tian
Jiang, Xudong
Chan, Chee Seng
Tan, Yap-Peng
Hu, Weipeng
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6180 - 6189
[47] Advancements in adversarial generative text-to-image models: a review
Zaghloul, Rawan
Rawashdeh, Enas
Bani-Ata, Tomader
IMAGING SCIENCE JOURNAL, 2024,
[48] Towards Geographic Inclusion in the Evaluation of Text-to-Image Models
Hall, Melissa
Bell, Samuel J.
Ross, Candace
Williams, Adina
Drozdzal, Michal
Soriano, Adriana Romero
PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024, 2024, : 585 - 601
[49] Discriminative Class Tokens for Text-to-Image Diffusion Models
Schwartz, Idan
Snaebjarnarson, Vesteinn
Chefer, Hila
Belongie, Serge
Wolf, Lior
Benaim, Sagie
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22668 - 22678
[50] Adding Conditional Control to Text-to-Image Diffusion Models
Zhang, Lvmin
Rao, Anyi
Agrawala, Maneesh
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3813 - 3824

← 1 2 3 4 5 →