PreciseControl: Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control

被引:0
|
作者
Parihar, Rishubh [1 ]
Sachidanand, V. S. [1 ]
Mani, Sabraswaran [2 ]
Karmali, Tejan [1 ]
Babu, R. Venkatesh [1 ]
机构
[1] IISc Bangalore, Vis & AI Lab, Bengaluru, India
[2] IIT Kharagpur, Kharagpur, W Bengal, India
来源
关键词
Personalised Image Generation; Fine-grained editing;
D O I
10.1007/978-3-031-73007-8_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, we have seen a surge of personalization methods for text-to-image (T2I) diffusion models to learn a concept using a few images. Existing approaches, when used for face personalization, suffer to achieve convincing inversion with identity preservation and rely on semantic text-based editing of the generated face. However, a more fine-grained control is desired for facial attribute editing, which is challenging to achieve solely with text prompts. In contrast, StyleGAN models learn a rich face prior and enable smooth control towards fine-grained attribute editing by latent manipulation. This work uses the disentangled W+ space of StyleGANs to condition the T2I model. This approach allows us to precisely manipulate facial attributes, such as smoothly introducing a smile, while preserving the existing coarse text-based control inherent in T2I models. To enable conditioning of the T2I model on the W+ space, we train a latent mapper to translate latent codes from W+ to the token embedding space of the T2I model. The proposed approach excels in the precise inversion of face images with attribute preservation and facilitates continuous control for fine-grained attribute editing. Furthermore, our approach can be readily extended to generate compositions involving multiple individuals. We perform extensive experiments to validate our method for face personalization and fine-grained attribute editing.
引用
收藏
页码:469 / 487
页数:19
相关论文
共 50 条
  • [1] Fine-Grained Human Hair Segmentation Using a Text-to-Image Diffusion Model
    Kim, Dohyun
    Lee, Euna
    Yoo, Daehyun
    Lee, Hongchul
    IEEE ACCESS, 2024, 12 : 13912 - 13922
  • [2] Text-to-Image Generation Grounded by Fine-Grained User Attention
    Koh, Jing Yu
    Baldridge, Jason
    Lee, Honglak
    Yang, Yinfei
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 237 - 246
  • [3] Decoupling Control in Text-to-Image Diffusion Models
    Cao, Shitong
    Zhang, Xuejie
    Wang, Jin
    Zhou, Xiaobing
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VII, ICIC 2024, 2024, 14868 : 312 - 322
  • [4] InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
    Hoe, Jiun Tian
    Jiang, Xudong
    Chan, Chee Seng
    Tan, Yap-Peng
    Hu, Weipeng
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6180 - 6189
  • [5] Adding Conditional Control to Text-to-Image Diffusion Models
    Zhang, Lvmin
    Rao, Anyi
    Agrawala, Maneesh
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3813 - 3824
  • [6] GrainedCLIP and DiffusionGrainedCLIP: Text-Guided Advanced Models for Fine-Grained Attribute Face Image Processing
    Zhu, Jincheng
    Mu, Liwei
    IEEE ACCESS, 2023, 11 : 99030 - 99045
  • [7] Multi-Sentence Auxiliary Adversarial Networks for Fine-Grained Text-to-Image Synthesis
    Yang, Yanhua
    Wang, Lei
    Xie, De
    Deng, Cheng
    Tao, Dacheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 (30) : 2798 - 2809
  • [8] Fine-Grained Cross-Modal Fusion Based Refinement for Text-to-Image Synthesis
    SUN Haoran
    WANG Yang
    LIU Haipeng
    QIAN Biao
    Chinese Journal of Electronics, 2023, 32 (06) : 1329 - 1340
  • [9] Fine-Grained Cross-Modal Fusion Based Refinement for Text-to-Image Synthesis
    Haoran, Sun
    Yang, Wang
    Haipeng, Liu
    Biao, Qian
    CHINESE JOURNAL OF ELECTRONICS, 2023, 32 (06) : 1329 - 1340
  • [10] Debiasing Text-to-Image Diffusion Models
    He, Ruifei
    Xue, Chuhui
    Tan, Haoru
    Zhang, Wenqing
    Yu, Yingchen
    Bai, Song
    Qi, Xiaojuan
    PROCEEDINGS OF THE 1ST ACM MULTIMEDIA WORKSHOP ON MULTI-MODAL MISINFORMATION GOVERNANCE IN THE ERA OF FOUNDATION MODELS, MIS 2024, 2024, : 29 - 36