Prompt Refinement with Image Pivot for Text-to-Image Generation

被引:0
|
作者
Zhan, Jingtao [1 ]
Ai, Qingyao [1 ]
Liu, Yiqun [1 ]
Pan, Yingwei [2 ]
Yao, Ting [2 ]
Mao, Jiaxin [3 ]
Ma, Shaoping [1 ]
Mei, Tao [2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Zhongguancun Lab, Beijing, Peoples R China
[2] HiDream Ai, Beijing, Peoples R China
[3] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For text-to-image generation, automatically refining user-provided natural language prompts into the keyword-enriched prompts favored by systems is essential for the user experience. Such a prompt refinement process is analogous to translating the prompt from "user languages" into "system languages". However, the scarcity of such parallel corpora makes it difficult to train a prompt refinement model. Inspired by zero-shot machine translation techniques, we introduce Prompt Refinement with Image Pivot (PRIP). PRIP innovatively uses the latent representation of a user-preferred image as an intermediary "pivot" between the user and system languages. It decomposes the refinement process into two data-rich tasks: inferring representations of user-preferred images from user languages and subsequently translating image representations into system languages. Thus, it can leverage abundant data for training. Extensive experiments show that PRIP substantially outperforms a wide range of baselines and effectively transfers to unseen systems in a zero-shot manner(1).
引用
收藏
页码:941 / 954
页数:14
相关论文
共 50 条
  • [1] A taxonomy of prompt modifiers for text-to-image generation
    Oppenlaender, Jonas
    BEHAVIOUR & INFORMATION TECHNOLOGY, 2024, 43 (15) : 3763 - 3776
  • [2] Prompt Stealing Attacks Against Text-to-Image Generation Models
    Shen, Xinyue
    Qu, Yiting
    Backes, Michael
    Zhang, Yang
    PROCEEDINGS OF THE 33RD USENIX SECURITY SYMPOSIUM, SECURITY 2024, 2024, : 5823 - 5840
  • [3] Capability-aware Prompt Reformulation Learning for Text-to-Image Generation
    Zhan, Jingtao
    Ai, Qingyao
    Liu, Yiqun
    Chen, Jia
    Ma, Shaoping
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2145 - 2155
  • [4] Controllable Text-to-Image Generation
    Li, Bowen
    Qi, Xiaojuan
    Lukasiewicz, Thomas
    Torr, Philip H. S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [5] Surgical text-to-image generation
    Nwoye, Chinedu Innocent
    Bose, Rupak
    Elgohary, Kareem
    Arboit, Lorenzo
    Carlino, Giorgio
    Lavanchy, Joel L.
    Mascagni, Pietro
    Padoy, Nicolas
    PATTERN RECOGNITION LETTERS, 2025, 190 : 73 - 80
  • [6] PROMPTIST: Automated Prompt Optimization for Text-to-Image Synthesis
    Li, WeiJie
    Wane, Jin
    Zhang, Xuejie
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT II, NLPCC 2024, 2025, 15360 : 295 - 306
  • [7] Expressive Text-to-Image Generation with Rich Text
    Ge, Songwei
    Park, Taesung
    Zhu, Jun-Yan
    Huang, Jia-Bin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7511 - 7522
  • [8] Development and Classification of Image Dataset for Text-to-Image Generation
    Kumar M.
    Mittal M.
    Singh S.
    Journal of The Institution of Engineers (India): Series B, 2024, 105 (04) : 787 - 796
  • [9] SEMANTICALLY INVARIANT TEXT-TO-IMAGE GENERATION
    Sah, Shagan
    Peri, Dheeraj
    Shringi, Ameya
    Zhang, Chi
    Dominguez, Miguel
    Savakis, Andreas
    Ptucha, Ray
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 3783 - 3787
  • [10] PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation
    Feng, Yingchaojie
    Wang, Xingbo
    Wong, Kam Kwai
    Wang, Sijia
    Lu, Yuhong
    Zhu, Minfeng
    Wang, Baicheng
    Chen, Wei
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (01) : 295 - 305