PreciseControl: Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control

被引:0
|
作者
Parihar, Rishubh [1 ]
Sachidanand, V. S. [1 ]
Mani, Sabraswaran [2 ]
Karmali, Tejan [1 ]
Babu, R. Venkatesh [1 ]
机构
[1] IISc Bangalore, Vis & AI Lab, Bengaluru, India
[2] IIT Kharagpur, Kharagpur, W Bengal, India
来源
关键词
Personalised Image Generation; Fine-grained editing;
D O I
10.1007/978-3-031-73007-8_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, we have seen a surge of personalization methods for text-to-image (T2I) diffusion models to learn a concept using a few images. Existing approaches, when used for face personalization, suffer to achieve convincing inversion with identity preservation and rely on semantic text-based editing of the generated face. However, a more fine-grained control is desired for facial attribute editing, which is challenging to achieve solely with text prompts. In contrast, StyleGAN models learn a rich face prior and enable smooth control towards fine-grained attribute editing by latent manipulation. This work uses the disentangled W+ space of StyleGANs to condition the T2I model. This approach allows us to precisely manipulate facial attributes, such as smoothly introducing a smile, while preserving the existing coarse text-based control inherent in T2I models. To enable conditioning of the T2I model on the W+ space, we train a latent mapper to translate latent codes from W+ to the token embedding space of the T2I model. The proposed approach excels in the precise inversion of face images with attribute preservation and facilitates continuous control for fine-grained attribute editing. Furthermore, our approach can be readily extended to generate compositions involving multiple individuals. We perform extensive experiments to validate our method for face personalization and fine-grained attribute editing.
引用
收藏
页码:469 / 487
页数:19
相关论文
共 50 条
  • [21] Towards Enhancing Fine-grained Details for Image Matting
    Liu, Chang
    Ding, Henghui
    Jiang, Xudong
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 385 - 393
  • [22] Discriminative Class Tokens for Text-to-Image Diffusion Models
    Schwartz, Idan
    Snaebjarnarson, Vesteinn
    Chefer, Hila
    Belongie, Serge
    Wolf, Lior
    Benaim, Sagie
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22668 - 22678
  • [23] Out-of-Distribution with Text-to-Image Diffusion Models
    Tong, Jinglin
    Dai, Longquan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XI, 2024, 14435 : 276 - 288
  • [24] Editing Implicit Assumptions in Text-to-Image Diffusion Models
    Orgad, Hadas
    Kawar, Bahjat
    Belinkov, Yonatan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7030 - 7038
  • [25] Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
    Wu, Qiucheng
    Liu, Yujian
    Zhao, Handong
    Kale, Ajinkya
    Bui, Trung
    Yu, Tong
    Lin, Zhe
    Zhang, Yang
    Chang, Shiyu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1900 - 1910
  • [26] Adversarial Robustification via Text-to-Image Diffusion Models
    Choi, Daewon
    Jeong, Jongheon
    Jang, Huiwon
    Shin, Jinwoo
    COMPUTER VISION - ECCV 2024, PT LXXXI, 2025, 15139 : 158 - 177
  • [27] Unleashing Text-to-Image Diffusion Models for Visual Perception
    Zhao, Wenliang
    Rao, Yongming
    Liu, Zuyan
    Liu, Benlin
    Zhou, Jie
    Lu, Jiwen
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5706 - 5716
  • [28] Sketch-Guided Text-to-Image Diffusion Models
    Voynov, Andrey
    Aberman, Kfir
    Cohen-Or, Daniel
    PROCEEDINGS OF SIGGRAPH 2023 CONFERENCE PAPERS, SIGGRAPH 2023, 2023,
  • [29] DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation
    Zeng, Chong
    Dong, Yue
    Peers, Pieter
    Kong, Youkang
    Wu, Hongzhi
    Tong, Xin
    PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
  • [30] A Fine-Grained Image Access Control Model
    Al Bouna, Bechara
    Chbeir, Richard
    Gabillon, Alban
    Capolsini, Patrick
    8TH INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGY & INTERNET BASED SYSTEMS (SITIS 2012), 2012, : 603 - 612