Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

被引：0

作者：

Wei, Fanyue ^{[1
]}

Zeng, Wei ^{[2
]}

Li, Zhenyang ^{[2
]}

Yin, Dawei ^{[2
]}

Duan, Lixin ^{[1
]}

Li, Wen ^{[1
]}

机构：

[1] Univ Elect Sci & Technol China, Chengdu, Peoples R China

[2] Baidu Inc, Beijing, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT XXVII | 2025年 / 15085卷

基金：

中国国家自然科学基金;

关键词：

Personalized Text-to-Image Generation; Reinforcement Learning; Visual Fidelity;

D O I：

10.1007/978-3-031-73383-3_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based generation models, the visual structure and details of the object are often unexpectedly changed during the diffusion process. One major reason is that these diffusion-based approaches typically adopt a simple reconstruction objective during training, which can hardly enforce appropriate structural consistency between the generated and the reference images. To this end, in this paper, we design a novel reinforcement learning framework by utilizing the deterministic policy gradient method for personalized text-to-image generation, with which various objectives, differential or even non-differential, can be easily incorporated to supervise the diffusion models to improve the quality of the generated images. Experimental results on personalized text-to-image generation benchmark datasets demonstrate that our proposed approach outperforms existing state-of-the-art methods by a large margin on visual fidelity while maintaining text-alignment. Our code is available at: https://github.com/wfanyue/DPG-T2I-Personalization.

引用

页码：394 / 410

页数：17

共 50 条

[31] Dense Text-to-Image Generation with Attention Modulation
Kim, Yunji
Lee, Jiyoung
Kim, Jin-Hwa
Ha, Jung-Woo
Zhu, Jun-Yan
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7667 - 7677
[32] Enhancing Reinforcement Learning Finetuned Text-to-Image Generative Model Using Reward Ensemble
Back, Kyungryul
Piao, XinYu
Kim, Jong-Kook
GENERATIVE INTELLIGENCE AND INTELLIGENT TUTORING SYSTEMS, PT II, ITS 2024, 2024, 14799 : 213 - 224
[33] StyleDrop: Text-to-Image Generation in Any Style
Sohn, Kihyuk
Ruiz, Nataniel
Lee, Kimin
Chin, Daniel Castro
Blok, Irina
Chang, Huiwen
Barber, Jarred
Jiang, Lu
Entis, Glenn
Li, Yuanzhen
Hao, Yuan
Essa, Irfan
Rubinstein, Michael
Krishnan, Dilip
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[34] A taxonomy of prompt modifiers for text-to-image generation
Oppenlaender, Jonas
BEHAVIOUR & INFORMATION TECHNOLOGY, 2024, 43 (15) : 3763 - 3776
[35] FedPAM: Federated Personalized Augmentation Model for Text-to-Image Retrieval
Feng, Yueying
Ma, Fan
Lin, Wang
Yao, Chang
Chen, Jingyuan
Yang, Yi
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1185 - 1189
[36] Text-to-Image Generation Method Based on Image-Text Semantic Consistency
Xue Z.
Xu Z.
Lang C.
Feng S.
Wang T.
Li Y.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2180 - 2190
[37] Text-to-Image Synthesis via Aesthetic Layout
Baraheem, Samah Saeed
Trung-Nghia Le
Nguyen, Tam, V
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4485 - 4487
[38] Text-to-image via mask anchor points
Baraheem, Samah S.
Nguyen, Tam, V
PATTERN RECOGNITION LETTERS, 2020, 133 : 25 - 32
[39] Generative adversarial text-to-image generation with style image constraint
Zekang Wang
Li Liu
Huaxiang Zhang
Dongmei Liu
Yu Song
Multimedia Systems, 2023, 29 : 3291 - 3303
[40] Generative adversarial text-to-image generation with style image constraint
Wang, Zekang
Liu, Li
Zhang, Huaxiang
Liu, Dongmei
Song, Yu
MULTIMEDIA SYSTEMS, 2023, 29 (06) : 3291 - 3303

← 1 2 3 4 5 →