Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

被引:0
|
作者
Wei, Fanyue [1 ]
Zeng, Wei [2 ]
Li, Zhenyang [2 ]
Yin, Dawei [2 ]
Duan, Lixin [1 ]
Li, Wen [1 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[2] Baidu Inc, Beijing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Personalized Text-to-Image Generation; Reinforcement Learning; Visual Fidelity;
D O I
10.1007/978-3-031-73383-3_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based generation models, the visual structure and details of the object are often unexpectedly changed during the diffusion process. One major reason is that these diffusion-based approaches typically adopt a simple reconstruction objective during training, which can hardly enforce appropriate structural consistency between the generated and the reference images. To this end, in this paper, we design a novel reinforcement learning framework by utilizing the deterministic policy gradient method for personalized text-to-image generation, with which various objectives, differential or even non-differential, can be easily incorporated to supervise the diffusion models to improve the quality of the generated images. Experimental results on personalized text-to-image generation benchmark datasets demonstrate that our proposed approach outperforms existing state-of-the-art methods by a large margin on visual fidelity while maintaining text-alignment. Our code is available at: https://github.com/wfanyue/DPG-T2I-Personalization.
引用
收藏
页码:394 / 410
页数:17
相关论文
共 50 条
  • [41] Adversarial Representation Learning for Text-to-Image Matching
    Sarafianos, Nikolaos
    Xu, Xiang
    Kakadiaris, Ioannis A.
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5813 - 5823
  • [42] Improving text-to-image generation with object layout guidance
    Jezia Zakraoui
    Moutaz Saleh
    Somaya Al-Maadeed
    Jihad Mohammed Jaam
    Multimedia Tools and Applications, 2021, 80 : 27423 - 27443
  • [43] HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
    Narasimhaswamy, Supreeth
    Bhattacharya, Uttaran
    Chen, Xiang
    Dasgupta, Ishita
    Mitra, Saayan
    Hoai, Minh
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 2468 - 2479
  • [44] Attribute-Centric Compositional Text-to-Image Generation
    Cong, Yuren
    Min, Martin Renqiang
    Li, Li Erran
    Rosenhahn, Bodo
    Yang, Michael Ying
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
  • [45] Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning?
    Gao, Jialu
    Hu, Kaizhe
    Xu, Guowei
    Xu, Huazhe
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [46] Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement
    Chen, Zhennan
    Li, Yajie
    Wang, Haofan
    Chen, Zhibo
    Jiang, Zhengkai
    Li, Jun
    Wang, Qian
    Yang, Jian
    Tai, Ying
    arXiv,
  • [47] Using text-to-image generation for architectural design ideation
    Paananen, Ville
    Oppenlaender, Jonas
    Visuri, Aku
    INTERNATIONAL JOURNAL OF ARCHITECTURAL COMPUTING, 2024, 22 (03) : 458 - 474
  • [48] No-reference Quality Assessment of Text-to-Image Generation
    Huang, Haitao
    Jia, Rongli
    Zhang, Yuhong
    Xie, Rong
    Song, Li
    Li, Lin
    Feng, Yanan
    19TH IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING, BMSB 2024, 2024, : 357 - 362
  • [49] Latent Guard: A Safety Framework for Text-to-Image Generation
    Liu, Runtao
    Khakzar, Ashkan
    Gu, Jindong
    Chen, Qifeng
    Torr, Philip
    Pizzati, Fabio
    COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 93 - 109
  • [50] Improving text-to-image generation with object layout guidance
    Zakraoui, Jezia
    Saleh, Moutaz
    Al-Maadeed, Somaya
    Jaam, Jihad Mohammed
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (18) : 27423 - 27443