Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

被引:0
|
作者
Kwon, Gihyun [1 ,2 ]
Jenni, Simon [2 ]
Li, Dingzeyu [2 ]
Lee, Joon-Young [2 ]
Ye, Jong Chul [1 ]
Heilbron, Fabian Caba [2 ]
机构
[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea
[2] Adobe, San Jose, CA 95110 USA
关键词
D O I
10.1109/CVPR52733.2024.00848
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging. In this work, we introduce Concept Weaver, a method for composing customized text-to-image diffusion models at inference time. Specifically, the method breaks the process into two steps: creating a template image aligned with the semantics of input prompts, and then personalizing the template using a concept fusion strategy. The fusion strategy incorporates the appearance of the target concepts into the template image while retaining its structural details. The results indicate that our method can generate multiple custom concepts with higher identity fidelity compared to alternative approaches. Furthermore, the method is shown to seamlessly handle more than two concepts and closely follow the semantic meaning of the input prompt without blending appearances across different subjects.
引用
收藏
页码:8880 / 8889
页数:10
相关论文
共 50 条
  • [41] Multi-concept multi-modality active learning for interactive video annotation
    Wang, Meng
    Hua, Xian-Sheng
    Song, Yan
    Tang, Jinhui
    Dai, Li-Rong
    ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 321 - +
  • [42] INTERACTIVE VIDEO ANNOTATION BY MULTI-CONCEPT MULTI-MODALITY ACTIVE LEARNING
    Wang, Meng
    Hua, Xian-Sheng
    Mei, Tao
    Tang, Jinhui
    Qi, Guo-Jun
    Song, Yan
    Dai, Li-Rong
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2007, 1 (04) : 459 - 477
  • [43] SINE: SINgle Image Editing with Text-to-Image Diffusion Models
    Zhang, Zhixing
    Han, Ligong
    Ghosh, Arnab
    Metaxas, Dimitris
    Ren, Jian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6027 - 6037
  • [44] A benchmark test suite for evolutionary multi-objective multi-concept optimization
    Niloy, Rounak Saha
    Singh, Hemant Kumar
    Ray, Tapabrata
    SWARM AND EVOLUTIONARY COMPUTATION, 2024, 84
  • [45] SAW-GAN: Multi-granularity Text Fusion Generative Adversarial Networks for text-to-image generation
    Jin, Dehu
    Yu, Qi
    Yu, Lan
    Qi, Meng
    KNOWLEDGE-BASED SYSTEMS, 2024, 294
  • [46] InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
    Hoe, Jiun Tian
    Jiang, Xudong
    Chan, Chee Seng
    Tan, Yap-Peng
    Hu, Weipeng
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6180 - 6189
  • [47] Advancements in adversarial generative text-to-image models: a review
    Zaghloul, Rawan
    Rawashdeh, Enas
    Bani-Ata, Tomader
    IMAGING SCIENCE JOURNAL, 2024,
  • [48] Towards Geographic Inclusion in the Evaluation of Text-to-Image Models
    Hall, Melissa
    Bell, Samuel J.
    Ross, Candace
    Williams, Adina
    Drozdzal, Michal
    Soriano, Adriana Romero
    PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024, 2024, : 585 - 601
  • [49] Discriminative Class Tokens for Text-to-Image Diffusion Models
    Schwartz, Idan
    Snaebjarnarson, Vesteinn
    Chefer, Hila
    Belongie, Serge
    Wolf, Lior
    Benaim, Sagie
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22668 - 22678
  • [50] Adding Conditional Control to Text-to-Image Diffusion Models
    Zhang, Lvmin
    Rao, Anyi
    Agrawala, Maneesh
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3813 - 3824