Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

被引:0
|
作者
Wei, Fanyue [1 ]
Zeng, Wei [2 ]
Li, Zhenyang [2 ]
Yin, Dawei [2 ]
Duan, Lixin [1 ]
Li, Wen [1 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[2] Baidu Inc, Beijing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Personalized Text-to-Image Generation; Reinforcement Learning; Visual Fidelity;
D O I
10.1007/978-3-031-73383-3_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based generation models, the visual structure and details of the object are often unexpectedly changed during the diffusion process. One major reason is that these diffusion-based approaches typically adopt a simple reconstruction objective during training, which can hardly enforce appropriate structural consistency between the generated and the reference images. To this end, in this paper, we design a novel reinforcement learning framework by utilizing the deterministic policy gradient method for personalized text-to-image generation, with which various objectives, differential or even non-differential, can be easily incorporated to supervise the diffusion models to improve the quality of the generated images. Experimental results on personalized text-to-image generation benchmark datasets demonstrate that our proposed approach outperforms existing state-of-the-art methods by a large margin on visual fidelity while maintaining text-alignment. Our code is available at: https://github.com/wfanyue/DPG-T2I-Personalization.
引用
收藏
页码:394 / 410
页数:17
相关论文
共 50 条
  • [1] Subject-driven Text-to-Image Generation via Apprenticeship Learning
    Chen, Wenhu
    Hu, Hexiang
    Li, Yandong
    Ruiz, Nataniel
    Jia, Xuhui
    Chang, Ming-Wei
    Cohen, William W.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] MirrorGAN: Learning Text-to-image Generation by Redescription
    Qiao, Tingting
    Zhang, Jing
    Xu, Duanqing
    Tao, Dacheng
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1505 - 1514
  • [3] CogView: Mastering Text-to-Image Generation via Transformers
    Ding, Ming
    Yang, Zhuoyi
    Hong, Wenyi
    Zheng, Wendi
    Zhou, Chang
    Yin, Da
    Lin, Junyang
    Zou, Xu
    Shao, Zhou
    Yang, Hongxia
    Tang, Jie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation
    Wei, Yuxiang
    Ji, Zhilong
    Bai, Jinfeng
    Zhang, Hongzhi
    Zhang, Lei
    Zuo, Wangmeng
    COMPUTER VISION - ECCV 2024, PT LI, 2025, 15109 : 252 - 271
  • [5] Variational Distribution Learning for Unsupervised Text-to-Image Generation
    Kang, Minsoo
    Lee, Doyup
    Kim, Jiseob
    Kim, Saehoon
    Han, Bohyung
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23380 - 23389
  • [6] Controllable Text-to-Image Generation
    Li, Bowen
    Qi, Xiaojuan
    Lukasiewicz, Thomas
    Torr, Philip H. S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [7] Surgical text-to-image generation
    Nwoye, Chinedu Innocent
    Bose, Rupak
    Elgohary, Kareem
    Arboit, Lorenzo
    Carlino, Giorgio
    Lavanchy, Joel L.
    Mascagni, Pietro
    Padoy, Nicolas
    PATTERN RECOGNITION LETTERS, 2025, 190 : 73 - 80
  • [8] Parrot: Pareto-Optimal Multi-reward Reinforcement Learning Framework for Text-to-Image Generation
    Lee, Seung Hyun
    Li, Yinxiao
    Ke, Junjie
    Yoo, Innfarn
    Zhang, Han
    Yu, Jiahui
    Wang, Qifei
    Deng, Fei
    Entis, Glenn
    He, Junfeng
    Li, Gang
    Kim, Sangpil
    Essa, Irfan
    Yang, Feng
    COMPUTER VISION-ECCV 2024, PT XXXVIII, 2025, 15096 : 462 - 478
  • [9] Muse: Text-To-Image Generation via Masked Generative Transformers
    Chang, Huiwen
    Zhang, Han
    Barber, Jarred
    Maschinot, A. J.
    Lezama, Jose
    Jiang, Lu
    Yang, Ming-Hsuan
    Murphy, Kevin
    Freeman, William T.
    Rubinstein, Michael
    Li, Yuanzhen
    Krishnan, Dilip
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [10] Text-to-Image Generation via Semi-Supervised Training
    Ji, Zhongyi
    Wang, Wenmin
    Chen, Baoyang
    Han, Xiao
    2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 265 - 268