Prompt Refinement with Image Pivot for Text-to-Image Generation

被引:0
|
作者
Zhan, Jingtao [1 ]
Ai, Qingyao [1 ]
Liu, Yiqun [1 ]
Pan, Yingwei [2 ]
Yao, Ting [2 ]
Mao, Jiaxin [3 ]
Ma, Shaoping [1 ]
Mei, Tao [2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Zhongguancun Lab, Beijing, Peoples R China
[2] HiDream Ai, Beijing, Peoples R China
[3] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For text-to-image generation, automatically refining user-provided natural language prompts into the keyword-enriched prompts favored by systems is essential for the user experience. Such a prompt refinement process is analogous to translating the prompt from "user languages" into "system languages". However, the scarcity of such parallel corpora makes it difficult to train a prompt refinement model. Inspired by zero-shot machine translation techniques, we introduce Prompt Refinement with Image Pivot (PRIP). PRIP innovatively uses the latent representation of a user-preferred image as an intermediary "pivot" between the user and system languages. It decomposes the refinement process into two data-rich tasks: inferring representations of user-preferred images from user languages and subsequently translating image representations into system languages. Thus, it can leverage abundant data for training. Extensive experiments show that PRIP substantially outperforms a wide range of baselines and effectively transfers to unseen systems in a zero-shot manner(1).
引用
收藏
页码:941 / 954
页数:14
相关论文
共 50 条
  • [21] Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement
    Chen, Zhennan
    Li, Yajie
    Wang, Haofan
    Chen, Zhibo
    Jiang, Zhengkai
    Li, Jun
    Wang, Qian
    Yang, Jian
    Tai, Ying
    arXiv,
  • [22] Visual Programming for Text-to-Image Generation and Evaluation
    Cho, Jaemin
    Zala, Abhay
    Bansal, Mohit
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [23] MirrorGAN: Learning Text-to-image Generation by Redescription
    Qiao, Tingting
    Zhang, Jing
    Xu, Duanqing
    Tao, Dacheng
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1505 - 1514
  • [24] Zero-Shot Text-to-Image Generation
    Ramesh, Aditya
    Pavlov, Mikhail
    Goh, Gabriel
    Gray, Scott
    Voss, Chelsea
    Radford, Alec
    Chen, Mark
    Sutskever, Ilya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [25] Dense Text-to-Image Generation with Attention Modulation
    Kim, Yunji
    Lee, Jiyoung
    Kim, Jin-Hwa
    Ha, Jung-Woo
    Zhu, Jun-Yan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7667 - 7677
  • [26] StyleDrop: Text-to-Image Generation in Any Style
    Sohn, Kihyuk
    Ruiz, Nataniel
    Lee, Kimin
    Chin, Daniel Castro
    Blok, Irina
    Chang, Huiwen
    Barber, Jarred
    Jiang, Lu
    Entis, Glenn
    Li, Yuanzhen
    Hao, Yuan
    Essa, Irfan
    Rubinstein, Michael
    Krishnan, Dilip
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [27] Enhancing Arabic Content Generation with Prompt Augmentation Using Integrated GPT and Text-to-Image Models
    Elsharif, Wala
    She, James
    Nakov, Preslav
    Wong, Simon
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON INTERACTIVE MEDIA EXPERIENCES, IMX 2023, 2023, : 276 - 288
  • [28] EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
    Yang, Jingyuan
    Feng, Jiawei
    Huang, Hui
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6358 - 6368
  • [29] GreenStableYolo: Optimizing Inference Time and Image Quality of Text-to-Image Generation
    Gong, Jingzhi
    Li, Sisi
    D'Aloisio, Giordano
    Ding, Zishuo
    Ye, Yulong
    Langdon, William B.
    Sarro, Federica
    SEARCH-BASED SOFTWARE ENGINEERING, SSBSE 2024, 2024, 14767 : 70 - 76
  • [30] Prompt suffix-attack against text-to-image diffusion models
    Xiong, Siyun
    Du, Yanhui
    Wang, Zhuohao
    Sun, Peiqi
    NEUROCOMPUTING, 2025, 630