Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

被引:0
|
作者
Wei, Fanyue [1 ]
Zeng, Wei [2 ]
Li, Zhenyang [2 ]
Yin, Dawei [2 ]
Duan, Lixin [1 ]
Li, Wen [1 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[2] Baidu Inc, Beijing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Personalized Text-to-Image Generation; Reinforcement Learning; Visual Fidelity;
D O I
10.1007/978-3-031-73383-3_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based generation models, the visual structure and details of the object are often unexpectedly changed during the diffusion process. One major reason is that these diffusion-based approaches typically adopt a simple reconstruction objective during training, which can hardly enforce appropriate structural consistency between the generated and the reference images. To this end, in this paper, we design a novel reinforcement learning framework by utilizing the deterministic policy gradient method for personalized text-to-image generation, with which various objectives, differential or even non-differential, can be easily incorporated to supervise the diffusion models to improve the quality of the generated images. Experimental results on personalized text-to-image generation benchmark datasets demonstrate that our proposed approach outperforms existing state-of-the-art methods by a large margin on visual fidelity while maintaining text-alignment. Our code is available at: https://github.com/wfanyue/DPG-T2I-Personalization.
引用
收藏
页码:394 / 410
页数:17
相关论文
共 50 条
  • [31] Dense Text-to-Image Generation with Attention Modulation
    Kim, Yunji
    Lee, Jiyoung
    Kim, Jin-Hwa
    Ha, Jung-Woo
    Zhu, Jun-Yan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7667 - 7677
  • [32] Enhancing Reinforcement Learning Finetuned Text-to-Image Generative Model Using Reward Ensemble
    Back, Kyungryul
    Piao, XinYu
    Kim, Jong-Kook
    GENERATIVE INTELLIGENCE AND INTELLIGENT TUTORING SYSTEMS, PT II, ITS 2024, 2024, 14799 : 213 - 224
  • [33] StyleDrop: Text-to-Image Generation in Any Style
    Sohn, Kihyuk
    Ruiz, Nataniel
    Lee, Kimin
    Chin, Daniel Castro
    Blok, Irina
    Chang, Huiwen
    Barber, Jarred
    Jiang, Lu
    Entis, Glenn
    Li, Yuanzhen
    Hao, Yuan
    Essa, Irfan
    Rubinstein, Michael
    Krishnan, Dilip
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [34] A taxonomy of prompt modifiers for text-to-image generation
    Oppenlaender, Jonas
    BEHAVIOUR & INFORMATION TECHNOLOGY, 2024, 43 (15) : 3763 - 3776
  • [35] FedPAM: Federated Personalized Augmentation Model for Text-to-Image Retrieval
    Feng, Yueying
    Ma, Fan
    Lin, Wang
    Yao, Chang
    Chen, Jingyuan
    Yang, Yi
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1185 - 1189
  • [36] Text-to-Image Generation Method Based on Image-Text Semantic Consistency
    Xue Z.
    Xu Z.
    Lang C.
    Feng S.
    Wang T.
    Li Y.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2180 - 2190
  • [37] Text-to-Image Synthesis via Aesthetic Layout
    Baraheem, Samah Saeed
    Trung-Nghia Le
    Nguyen, Tam, V
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4485 - 4487
  • [38] Text-to-image via mask anchor points
    Baraheem, Samah S.
    Nguyen, Tam, V
    PATTERN RECOGNITION LETTERS, 2020, 133 : 25 - 32
  • [39] Generative adversarial text-to-image generation with style image constraint
    Zekang Wang
    Li Liu
    Huaxiang Zhang
    Dongmei Liu
    Yu Song
    Multimedia Systems, 2023, 29 : 3291 - 3303
  • [40] Generative adversarial text-to-image generation with style image constraint
    Wang, Zekang
    Liu, Li
    Zhang, Huaxiang
    Liu, Dongmei
    Song, Yu
    MULTIMEDIA SYSTEMS, 2023, 29 (06) : 3291 - 3303