PreciseControl: Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control

被引：0

作者：

Parihar, Rishubh ^{[1
]}

Sachidanand, V. S. ^{[1
]}

Mani, Sabraswaran ^{[2
]}

Karmali, Tejan ^{[1
]}

Babu, R. Venkatesh ^{[1
]}

机构：

[1] IISc Bangalore, Vis & AI Lab, Bengaluru, India

[2] IIT Kharagpur, Kharagpur, W Bengal, India

来源：

COMPUTER VISION-ECCV 2024, PT LXXXII | 2025年 / 15140卷

关键词：

Personalised Image Generation; Fine-grained editing;

D O I：

10.1007/978-3-031-73007-8_27

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, we have seen a surge of personalization methods for text-to-image (T2I) diffusion models to learn a concept using a few images. Existing approaches, when used for face personalization, suffer to achieve convincing inversion with identity preservation and rely on semantic text-based editing of the generated face. However, a more fine-grained control is desired for facial attribute editing, which is challenging to achieve solely with text prompts. In contrast, StyleGAN models learn a rich face prior and enable smooth control towards fine-grained attribute editing by latent manipulation. This work uses the disentangled W+ space of StyleGANs to condition the T2I model. This approach allows us to precisely manipulate facial attributes, such as smoothly introducing a smile, while preserving the existing coarse text-based control inherent in T2I models. To enable conditioning of the T2I model on the W+ space, we train a latent mapper to translate latent codes from W+ to the token embedding space of the T2I model. The proposed approach excels in the precise inversion of face images with attribute preservation and facilitates continuous control for fine-grained attribute editing. Furthermore, our approach can be readily extended to generate compositions involving multiple individuals. We perform extensive experiments to validate our method for face personalization and fine-grained attribute editing.

引用

页码：469 / 487

页数：19

共 50 条

[21] Towards Enhancing Fine-grained Details for Image Matting
Liu, Chang
Ding, Henghui
Jiang, Xudong
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 385 - 393
[22] Discriminative Class Tokens for Text-to-Image Diffusion Models
Schwartz, Idan
Snaebjarnarson, Vesteinn
Chefer, Hila
Belongie, Serge
Wolf, Lior
Benaim, Sagie
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22668 - 22678
[23] Out-of-Distribution with Text-to-Image Diffusion Models
Tong, Jinglin
Dai, Longquan
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XI, 2024, 14435 : 276 - 288
[24] Editing Implicit Assumptions in Text-to-Image Diffusion Models
Orgad, Hadas
Kawar, Bahjat
Belinkov, Yonatan
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7030 - 7038
[25] Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
Wu, Qiucheng
Liu, Yujian
Zhao, Handong
Kale, Ajinkya
Bui, Trung
Yu, Tong
Lin, Zhe
Zhang, Yang
Chang, Shiyu
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1900 - 1910
[26] Adversarial Robustification via Text-to-Image Diffusion Models
Choi, Daewon
Jeong, Jongheon
Jang, Huiwon
Shin, Jinwoo
COMPUTER VISION - ECCV 2024, PT LXXXI, 2025, 15139 : 158 - 177
[27] Unleashing Text-to-Image Diffusion Models for Visual Perception
Zhao, Wenliang
Rao, Yongming
Liu, Zuyan
Liu, Benlin
Zhou, Jie
Lu, Jiwen
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5706 - 5716
[28] Sketch-Guided Text-to-Image Diffusion Models
Voynov, Andrey
Aberman, Kfir
Cohen-Or, Daniel
PROCEEDINGS OF SIGGRAPH 2023 CONFERENCE PAPERS, SIGGRAPH 2023, 2023,
[29] DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation
Zeng, Chong
Dong, Yue
Peers, Pieter
Kong, Youkang
Wu, Hongzhi
Tong, Xin
PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
[30] A Fine-Grained Image Access Control Model
Al Bouna, Bechara
Chbeir, Richard
Gabillon, Alban
Capolsini, Patrick
8TH INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGY & INTERNET BASED SYSTEMS (SITIS 2012), 2012, : 603 - 612

← 1 2 3 4 5 →