A case study of fairness in generated images of Large Language Models for Software Engineering tasks

被引:0
|
作者
Sami, Mansour [1 ]
Sami, Ashkan [1 ]
Barclay, Pete [1 ]
机构
[1] Edinburg Napier Univ, Comp Sci Subject Grp, Edinburgh, Midlothian, Scotland
关键词
Large Language Models; bias; gender diversity; Generative images; DALL-E-2;
D O I
10.1109/ICSME58846.2023.00051
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Bias in Large Language Models (LLMs) has significant implications. Since they have revolutionized content creation on the web, they can lead to more unfair outcomes, lack of inclusivity, reinforcement of stereotypes and ethical and legal concerns. Notably, OpenAI has recently made claims they have introduced a new technique to ensure that DALL-E-2 generates images of people accurately reflect the diversity of the world's population. In order to investigate bias within the field of Software Engineering, the study utilized DALL-E-2 image generation to assess 56 tasks related to software engineering. Another objective was to determine the impact of OpenAI's new measures on the generated images for these specific tasks. Two sets of experiments were conducted. In one set, the tasks were prefixed with the clause "As a Software Engineer," while in the other set, only the tasks themselves were used. The tasks were presented in a gender-neutral manner, and the AI was instructed to generate images for each task 20 times. For a female-dominant task of doing administrative tasks, 40 more images were generated. The study revealed a large gender bias in the 2,280 images generated. For instance, in the subset of experiments with prompts explicitly incorporating the phrase "As a software engineer," only 2% of the generated images portrayed female protagonists. In all the images in this setting, male protagonists were dominant and in 45 tasks 100% of the protagonists were male. Notably, images generated without the prefixed clause only had more female protagonists in 'provide comments on project milestones' and 'provide enhancements', while other tasks did not exhibit a similar pattern. The findings emphasize unsuitability of implemented guardrails and the importance of further research on LLMs assessments. Further research is needed in LLMs to find out where their guardrails fail so companies can address them properly.
引用
收藏
页码:391 / 396
页数:6
相关论文
共 50 条
  • [1] An Exploratory Evaluation of Large Language Models Using Empirical Software Engineering Tasks
    Liang, Wenjun
    Xiao, Guanping
    [J]. PROCEEDINGS OF THE 15TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2024, 2024, : 31 - 40
  • [2] Application of Large Language Models to Software Engineering Tasks: Opportunities, Risks, and Implications
    Ozkaya, Ipek
    [J]. IEEE SOFTWARE, 2023, 40 (03) : 4 - 8
  • [3] Probing into the Fairness of Large Language Models: A Case Study of ChatGPT
    Li, Yunqi
    Zhang, Lanjing
    Zhang, Yongfeng
    [J]. 2024 58TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS, CISS, 2024,
  • [4] Challenges in applying large language models to requirements engineering tasks
    Norheim, Johannes J.
    Rebentisch, Eric
    Xiao, Dekai
    Draeger, Lorenz
    Kerbrat, Alain
    de Weck, Olivier L.
    [J]. DESIGN SCIENCE, 2024, 10
  • [5] Do Pretrained Language Models Indeed Understand Software Engineering Tasks?
    Li, Yao
    Zhang, Tao
    Luo, Xiapu
    Cai, Haipeng
    Fang, Sen
    Yuan, Dawei
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (10) : 4639 - 4655
  • [6] Evaluating Explanations for Software Patches Generated by Large Language Models
    Sobania, Dominik
    Geiger, Alina
    Callan, James
    Brownlee, Alexander
    Hanna, Carol
    Moussa, Rebecca
    Lopez, Mar Zamorano
    Petke, Justyna
    Sarro, Federica
    [J]. SEARCH-BASED SOFTWARE ENGINEERING, SSBSE 2023, 2024, 14415 : 147 - 152
  • [7] Adopting Pre-trained Large Language Models for Regional Language Tasks: A Case Study
    Gaikwad, Harsha
    Kiwelekar, Arvind
    Laddha, Manjushree
    Shahare, Shashank
    [J]. INTELLIGENT HUMAN COMPUTER INTERACTION, IHCI 2023, PT I, 2024, 14531 : 15 - 25
  • [8] Large Language Models for Software Engineering: A Systematic Literature Review
    Hou, Xinyi
    Zhao, Yanjie
    Liu, Yue
    Yang, Zhou
    Wang, Kailong
    Li, Li
    Luo, Xiapu
    Lo, David
    Grundy, John
    Wang, Haoyu
    [J]. ACM Transactions on Software Engineering and Methodology, 2024, 33 (08)
  • [9] Large Language Models for Software Engineering: Survey and Open Problems
    Fan, Angela
    Gokkaya, Beliz
    Harman, Mark
    Lyubarskiy, Mitya
    Sengupta, Shubho
    Yoo, Shin
    Zhang, Jie M.
    [J]. 2023 IEEE/ACM INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: FUTURE OF SOFTWARE ENGINEERING, ICSE-FOSE, 2023, : 31 - 53
  • [10] Why Large Language Models will (not) Kill Software Engineering Research
    Di Penta, Massimiliano
    [J]. PROCEEDINGS OF 2024 28TH INTERNATION CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2024, 2024, : 5 - 5