Enhancing Arabic Content Generation with Prompt Augmentation Using Integrated GPT and Text-to-Image Models

被引:1
|
作者
Elsharif, Wala [1 ]
She, James [1 ]
Nakov, Preslav [2 ]
Wong, Simon [3 ]
机构
[1] Hamad Bin Khalifa Univ, Ar Rayyan, Qatar
[2] Mohamed bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
[3] HKUST, Hong Kong, Peoples R China
关键词
Arabic culture; Prompt engineering; GPT; Integrated systems;
D O I
10.1145/3573381.3596466
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
With the current and continuous advancements in the field of text-to-image modeling, it has become critical to design prompts that make the best of these model capabilities and guides them to generate the most desirable images, and thus the field of prompt engineering has emerged. Here, we study a method to use prompt engineering to enhance text-to-image model representation of the Arabic culture. This work proposes a simple, novel approach for prompt engineering that uses the domain knowledge of a state-of-the-art language model, GPT, to perform the task of prompt augmentation, where a simple, initial prompt is used to generate multiple, more detailed prompts related to the Arabic culture from multiple categories through a GPT model through a process known as in-context learning. The augmented prompts are then used to generate images enhanced for the Arabic culture. We perform multiple experiments with a number of participants to evaluate the performance of the proposed method, which shows promising results, specially for generating prompts that are more inclusive of the different Arabic countries and with a wider variety in terms of image subjects, where we find that our proposed method generates image with more variety 85 % of the time and are more inclusive of the Arabic countries more than 72.66 % of the time, compared to the direct approach.
引用
收藏
页码:276 / 288
页数:13
相关论文
共 50 条
  • [31] From text to mask: Localizing entities using the attention of text-to-image diffusion models
    Xiao, Changming
    Yang, Qi
    Zhou, Feng
    Zhang, Changshui
    NEUROCOMPUTING, 2024, 610
  • [32] Enhancing chest X-ray diagnosis with text-to-image generation: A data case study
    Bahani, Mourad
    El Ouaazizi, Aziza
    Avram, Robert
    Maalmi, Khalil
    DISPLAYS, 2024, 83
  • [33] Transformer models for enhancing AttnGAN based text to image generation
    Naveen, S.
    Kiran, M. S. S. Ram
    Indupriya, M.
    Manikanta, T. V.
    Sudeep, P. V.
    IMAGE AND VISION COMPUTING, 2021, 115
  • [34] DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
    Ruiz, Nataniel
    Li, Yuanzhen
    Jampani, Varun
    Pritch, Yael
    Rubinstein, Michael
    Aberman, Kfir
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22500 - 22510
  • [35] Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models
    Qu, Yiting
    Shen, Xinyue
    He, Xinlei
    Backes, Michael
    Zannettou, Savvas
    Zhang, Yang
    PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 3403 - 3417
  • [36] Large-scale Text-to-Image Generation Models for Visual Artists' Creative Works
    Ko, Hyung-Kwon
    Park, Gwanmo
    Jeon, Hyeon
    Jo, Jaemin
    Kim, Juho
    Seo, Jinwook
    PROCEEDINGS OF 2023 28TH ANNUAL CONFERENCE ON INTELLIGENT USER INTERFACES, IUI 2023, 2023, : 919 - 933
  • [37] Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models
    Huang, Jia-Hong
    Zhu, Hongyi
    Shen, Yixian
    Rudinac, Stevan
    Kanoulas, Evangelos
    MULTIMEDIA MODELING, MMM 2025, PT IV, 2025, 15523 : 413 - 427
  • [38] JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
    Zeng, Yu
    Patel, Vishal M.
    Wang, Haochen
    Huang, Xun
    Wang, Ting-Chun
    Liu, Ming-Yu
    Balaji, Yogesh
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6786 - 6795
  • [39] SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
    Zhong, Shanshan
    Huang, Zhongzhan
    Wen, Wushao
    Qin, Jinghui
    Lin, Liang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 567 - 578
  • [40] Emergent Text-to-Image Generation Using Short Neologism Prompts and Negative Prompts
    Kanada, Yasusi
    2024 NICOGRAPH INTERNATIONAL, NICOINT 2024, 2024, : 86 - 86