A case study of fairness in generated images of Large Language Models for Software Engineering tasks

被引:0
|
作者
Sami, Mansour [1 ]
Sami, Ashkan [1 ]
Barclay, Pete [1 ]
机构
[1] Edinburg Napier Univ, Comp Sci Subject Grp, Edinburgh, Midlothian, Scotland
关键词
Large Language Models; bias; gender diversity; Generative images; DALL-E-2;
D O I
10.1109/ICSME58846.2023.00051
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Bias in Large Language Models (LLMs) has significant implications. Since they have revolutionized content creation on the web, they can lead to more unfair outcomes, lack of inclusivity, reinforcement of stereotypes and ethical and legal concerns. Notably, OpenAI has recently made claims they have introduced a new technique to ensure that DALL-E-2 generates images of people accurately reflect the diversity of the world's population. In order to investigate bias within the field of Software Engineering, the study utilized DALL-E-2 image generation to assess 56 tasks related to software engineering. Another objective was to determine the impact of OpenAI's new measures on the generated images for these specific tasks. Two sets of experiments were conducted. In one set, the tasks were prefixed with the clause "As a Software Engineer," while in the other set, only the tasks themselves were used. The tasks were presented in a gender-neutral manner, and the AI was instructed to generate images for each task 20 times. For a female-dominant task of doing administrative tasks, 40 more images were generated. The study revealed a large gender bias in the 2,280 images generated. For instance, in the subset of experiments with prompts explicitly incorporating the phrase "As a software engineer," only 2% of the generated images portrayed female protagonists. In all the images in this setting, male protagonists were dominant and in 45 tasks 100% of the protagonists were male. Notably, images generated without the prefixed clause only had more female protagonists in 'provide comments on project milestones' and 'provide enhancements', while other tasks did not exhibit a similar pattern. The findings emphasize unsuitability of implemented guardrails and the importance of further research on LLMs assessments. Further research is needed in LLMs to find out where their guardrails fail so companies can address them properly.
引用
收藏
页码:391 / 396
页数:6
相关论文
共 50 条
  • [21] Ontology engineering with Large Language Models
    Mateiu, Patricia
    Groza, Adrian
    [J]. 2023 25TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING, SYNASC 2023, 2023, : 226 - 229
  • [22] Robustness of GPT Large Language Models on Natural Language Processing Tasks
    Xuanting, Chen
    Junjie, Ye
    Can, Zu
    Nuo, Xu
    Tao, Gui
    Qi, Zhang
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (05): : 1128 - 1142
  • [23] Industrial Engineering with Large Language Models: A case study of ChatGPT's performance on Oil & Gas problems
    Ogundare, Oluwatosin
    Madasu, Srinath
    Wiggins, Nathanial
    [J]. 2023 11TH INTERNATIONAL CONFERENCE ON CONTROL, MECHATRONICS AND AUTOMATION, ICCMA, 2023, : 458 - 461
  • [24] InteNSE: Interpretability, Robustness, and Benchmarking in Neural Software Engineering (Second Edition: Large Language Models)
    University of Illinois, Urbana-Champaign, United States
    不详
    不详
    不详
    [J]. Proc. - IEEE/ACM Int. Workshop Interpretability, Robust., Benchmarking Neural Softw. Eng. InteNSE, (VI):
  • [25] AI-Tutoring in Software Engineering Education Experiences with Large Language Models in Programming Assessments
    Frankford, Eduard
    Sauerwein, Clemens
    Bassner, Patrick
    Krusche, Stephan
    Breu, Ruth
    [J]. 2024 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING EDUCATION AND TRAINING, ICSE-SEET 2024, 2024, : 309 - 319
  • [26] She Elicits Requirements and He Tests: Software Engineering Gender Bias in Large Language Models
    Treude, Christoph
    Hata, Hideaki
    [J]. 2023 IEEE/ACM 20TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2023, : 624 - 629
  • [27] Preventing and Detecting Misinformation Generated by Large Language Models
    Liu, Aiwei
    Sheng, Qiang
    Hu, Xuming
    [J]. PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 3001 - 3004
  • [28] An Empirical Study on How Large Language Models Impact Software Testing Learning
    Mezzaro, Simone
    Gambi, Alessio
    Fraser, Gordon
    [J]. PROCEEDINGS OF 2024 28TH INTERNATION CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2024, 2024, : 555 - 564
  • [29] Toward Optimal Selection of Information Retrieval Models for Software Engineering Tasks
    Rahman, Md Masudur
    Chakraborty, Saikat
    Kaiser, Gail
    Ray, Baishakhi
    [J]. 2019 19TH IEEE INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM), 2019, : 127 - 138
  • [30] Multimodal large language models for inclusive collaboration learning tasks
    Lewis, Armanda
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 202 - 210