Enhancing Arabic Content Generation with Prompt Augmentation Using Integrated GPT and Text-to-Image Models

被引:1
|
作者
Elsharif, Wala [1 ]
She, James [1 ]
Nakov, Preslav [2 ]
Wong, Simon [3 ]
机构
[1] Hamad Bin Khalifa Univ, Ar Rayyan, Qatar
[2] Mohamed bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
[3] HKUST, Hong Kong, Peoples R China
关键词
Arabic culture; Prompt engineering; GPT; Integrated systems;
D O I
10.1145/3573381.3596466
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
With the current and continuous advancements in the field of text-to-image modeling, it has become critical to design prompts that make the best of these model capabilities and guides them to generate the most desirable images, and thus the field of prompt engineering has emerged. Here, we study a method to use prompt engineering to enhance text-to-image model representation of the Arabic culture. This work proposes a simple, novel approach for prompt engineering that uses the domain knowledge of a state-of-the-art language model, GPT, to perform the task of prompt augmentation, where a simple, initial prompt is used to generate multiple, more detailed prompts related to the Arabic culture from multiple categories through a GPT model through a process known as in-context learning. The augmented prompts are then used to generate images enhanced for the Arabic culture. We perform multiple experiments with a number of participants to evaluate the performance of the proposed method, which shows promising results, specially for generating prompts that are more inclusive of the different Arabic countries and with a wider variety in terms of image subjects, where we find that our proposed method generates image with more variety 85 % of the time and are more inclusive of the Arabic countries more than 72.66 % of the time, compared to the direct approach.
引用
收藏
页码:276 / 288
页数:13
相关论文
共 50 条
  • [41] Enhancing Reinforcement Learning Finetuned Text-to-Image Generative Model Using Reward Ensemble
    Back, Kyungryul
    Piao, XinYu
    Kim, Jong-Kook
    GENERATIVE INTELLIGENCE AND INTELLIGENT TUTORING SYSTEMS, PT II, ITS 2024, 2024, 14799 : 213 - 224
  • [42] Automated Generation of Lung Cytological Images from Image Findings Using Text-to-Image Technology
    Teramoto, Atsushi
    Kiriyama, Yuka
    Michiba, Ayano
    Yazawa, Natsuki
    Tsukamoto, Tetsuya
    Imaizumi, Kazuyoshi
    Fujita, Hiroshi
    COMPUTERS, 2024, 13 (11)
  • [43] Using artificial intelligence in craft education: crafting with text-to-image generative models
    Vartiainen, Henriikka
    Tedre, Matti
    DIGITAL CREATIVITY, 2023, 34 (01) : 1 - 21
  • [44] Augmenters at SemEval-2023 Task 1: Enhancing CLIP in Handling Compositionality and Ambiguity for Zero-Shot Visual WSD through Prompt Augmentation and Text-To-Image Diffusion
    Li, Jie S.
    Shiue, Yow-Ting
    Shih, Yong-Siang
    Geiping, Jonas
    17TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2023, 2023, : 44 - 49
  • [45] Enhancing Baidu Multimodal Advertisement with Chinese Text-to-Image Generation via Bilingual Alignment and Caption Synthesis
    Zhao, Kang
    Zhao, Xinyu
    Jin, Zhipeng
    Yang, Yi
    Tao, Wen
    Han, Cong
    Li, Shuanglong
    Liu, Lin
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2855 - 2859
  • [46] DALL-EVAL: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
    Cho, Jaemin
    Zala, Abhay
    Bansal, Mohit
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3020 - 3031
  • [47] MaxFusion: Plug&Play Multi-modal Generation in Text-to-Image Diffusion Models
    Nair, Nithin Gopalakrishnan
    Valanarasu, Jeya Maria Jose
    Patel, Vishal M.
    COMPUTER VISION-ECCV 2024, PT XXXVIII, 2025, 15096 : 93 - 110
  • [48] DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models
    Sha, Zeyang
    Li, Zheng
    Yu, Ning
    Zhang, Yang
    PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 3418 - 3432
  • [49] Swinv2-Imagen: hierarchical vision transformer diffusion models for text-to-image generation
    Li, Ruijun
    Li, Weihua
    Yang, Yi
    Wei, Hanyu
    Jiang, Jianhua
    Bai, Quan
    NEURAL COMPUTING & APPLICATIONS, 2023, 36 (28): : 17245 - 17260
  • [50] Open-Source Text-to-Image Models: Evaluation using Metrics and Human Perception
    Yamac, Aylin
    Genc, Dilan
    Zaman, Esra
    Gerschner, Felix
    Klaiber, Marco
    Theissler, Andreas
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 1659 - 1664