Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

被引:0
|
作者
Wei, Fanyue [1 ]
Zeng, Wei [2 ]
Li, Zhenyang [2 ]
Yin, Dawei [2 ]
Duan, Lixin [1 ]
Li, Wen [1 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[2] Baidu Inc, Beijing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Personalized Text-to-Image Generation; Reinforcement Learning; Visual Fidelity;
D O I
10.1007/978-3-031-73383-3_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based generation models, the visual structure and details of the object are often unexpectedly changed during the diffusion process. One major reason is that these diffusion-based approaches typically adopt a simple reconstruction objective during training, which can hardly enforce appropriate structural consistency between the generated and the reference images. To this end, in this paper, we design a novel reinforcement learning framework by utilizing the deterministic policy gradient method for personalized text-to-image generation, with which various objectives, differential or even non-differential, can be easily incorporated to supervise the diffusion models to improve the quality of the generated images. Experimental results on personalized text-to-image generation benchmark datasets demonstrate that our proposed approach outperforms existing state-of-the-art methods by a large margin on visual fidelity while maintaining text-alignment. Our code is available at: https://github.com/wfanyue/DPG-T2I-Personalization.
引用
收藏
页码:394 / 410
页数:17
相关论文
共 50 条
  • [21] Diversified text-to-image generation via deep mutual information estimation
    Li, Ailin
    Zhao, Lei
    Zuo, Zhiwen
    Wang, Zhizhong
    Chen, Haibo
    Lu, Dongming
    Xing, Wei
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 211
  • [22] RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
    Xue, Zeyue
    Song, Guanglu
    Guo, Qiushan
    Liu, Boxiao
    Zong, Zhuofan
    Liu, Yu
    Luo, Ping
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [23] JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
    Zeng, Yu
    Patel, Vishal M.
    Wang, Haochen
    Huang, Xun
    Wang, Ting-Chun
    Liu, Ming-Yu
    Balaji, Yogesh
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6786 - 6795
  • [24] Learning Continuous 3D Words for Text-to-Image Generation
    Cheng, Ta-Ying
    Gadelha, Matheus
    Groueix, Thibault
    Fisher, Matthew
    Mech, Radomir
    Markham, Andrew
    Trigoni, Niki
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6753 - 6762
  • [25] Capability-aware Prompt Reformulation Learning for Text-to-Image Generation
    Zhan, Jingtao
    Ai, Qingyao
    Liu, Yiqun
    Chen, Jia
    Ma, Shaoping
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2145 - 2155
  • [26] Learning Multi-dimensional Human Preference for Text-to-Image Generation
    Zhang, Sixian
    Wang, Bohan
    Wu, Junqiang
    Li, Yan
    Gao, Tingting
    Zhang, Di
    Wang, Zhongyuan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8018 - 8027
  • [27] Prompt Refinement with Image Pivot for Text-to-Image Generation
    Zhan, Jingtao
    Ai, Qingyao
    Liu, Yiqun
    Pan, Yingwei
    Yao, Ting
    Mao, Jiaxin
    Ma, Shaoping
    Mei, Tao
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 941 - 954
  • [28] Development and Classification of Image Dataset for Text-to-Image Generation
    Kumar M.
    Mittal M.
    Singh S.
    Journal of The Institution of Engineers (India): Series B, 2024, 105 (04) : 787 - 796
  • [29] Visual Programming for Text-to-Image Generation and Evaluation
    Cho, Jaemin
    Zala, Abhay
    Bansal, Mohit
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [30] Zero-Shot Text-to-Image Generation
    Ramesh, Aditya
    Pavlov, Mikhail
    Goh, Gabriel
    Gray, Scott
    Voss, Chelsea
    Radford, Alec
    Chen, Mark
    Sutskever, Ilya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139