Enhancing Arabic Content Generation with Prompt Augmentation Using Integrated GPT and Text-to-Image Models

被引:1
|
作者
Elsharif, Wala [1 ]
She, James [1 ]
Nakov, Preslav [2 ]
Wong, Simon [3 ]
机构
[1] Hamad Bin Khalifa Univ, Ar Rayyan, Qatar
[2] Mohamed bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
[3] HKUST, Hong Kong, Peoples R China
关键词
Arabic culture; Prompt engineering; GPT; Integrated systems;
D O I
10.1145/3573381.3596466
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
With the current and continuous advancements in the field of text-to-image modeling, it has become critical to design prompts that make the best of these model capabilities and guides them to generate the most desirable images, and thus the field of prompt engineering has emerged. Here, we study a method to use prompt engineering to enhance text-to-image model representation of the Arabic culture. This work proposes a simple, novel approach for prompt engineering that uses the domain knowledge of a state-of-the-art language model, GPT, to perform the task of prompt augmentation, where a simple, initial prompt is used to generate multiple, more detailed prompts related to the Arabic culture from multiple categories through a GPT model through a process known as in-context learning. The augmented prompts are then used to generate images enhanced for the Arabic culture. We perform multiple experiments with a number of participants to evaluate the performance of the proposed method, which shows promising results, specially for generating prompts that are more inclusive of the different Arabic countries and with a wider variety in terms of image subjects, where we find that our proposed method generates image with more variety 85 % of the time and are more inclusive of the Arabic countries more than 72.66 % of the time, compared to the direct approach.
引用
收藏
页码:276 / 288
页数:13
相关论文
共 50 条
  • [1] Prompt Stealing Attacks Against Text-to-Image Generation Models
    Shen, Xinyue
    Qu, Yiting
    Backes, Michael
    Zhang, Yang
    PROCEEDINGS OF THE 33RD USENIX SECURITY SYMPOSIUM, SECURITY 2024, 2024, : 5823 - 5840
  • [2] Prompt Refinement with Image Pivot for Text-to-Image Generation
    Zhan, Jingtao
    Ai, Qingyao
    Liu, Yiqun
    Pan, Yingwei
    Yao, Ting
    Mao, Jiaxin
    Ma, Shaoping
    Mei, Tao
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 941 - 954
  • [3] A taxonomy of prompt modifiers for text-to-image generation
    Oppenlaender, Jonas
    BEHAVIOUR & INFORMATION TECHNOLOGY, 2024, 43 (15) : 3763 - 3776
  • [4] EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
    Yang, Jingyuan
    Feng, Jiawei
    Huang, Hui
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6358 - 6368
  • [5] Design Guidelines for Prompt Engineering Text-to-Image Generative Models
    Liu, Vivian
    Chilton, Lydia B.
    PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,
  • [6] Capability-aware Prompt Reformulation Learning for Text-to-Image Generation
    Zhan, Jingtao
    Ai, Qingyao
    Liu, Yiqun
    Chen, Jia
    Ma, Shaoping
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2145 - 2155
  • [7] Sarid: Arabic Storyteller Using a Fine-Tuned LLM and Text-to-Image Generation
    Alabdulrahman, Maria
    Khayyat, Renad
    Almowallad, Kawthar
    Alharz, Zahra
    2024 16TH INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING, ICCAE 2024, 2024, : 270 - 277
  • [8] Prompt suffix-attack against text-to-image diffusion models
    Xiong, Siyun
    Du, Yanhui
    Wang, Zhuohao
    Sun, Peiqi
    NEUROCOMPUTING, 2025, 630
  • [9] Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models
    Xu, Xingqian
    Guo, Jiayi
    Wang, Zhangyang
    Huang, Gao
    Essa, Irfan
    Shi, Humphrey
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8682 - 8692
  • [10] AraBERT and DF-GAN fusion for Arabic text-to-image generation
    Bahani, Mourad
    El Ouaazizi, Aziza
    Maalmi, Khalil
    Array, 2022, 16