A case study of fairness in generated images of Large Language Models for Software Engineering tasks

被引：0

作者：

Sami, Mansour ^{[1
]}

Sami, Ashkan ^{[1
]}

Barclay, Pete ^{[1
]}

机构：

[1] Edinburg Napier Univ, Comp Sci Subject Grp, Edinburgh, Midlothian, Scotland

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION, ICSME | 2023年

关键词：

Large Language Models; bias; gender diversity; Generative images; DALL-E-2;

D O I：

10.1109/ICSME58846.2023.00051

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Bias in Large Language Models (LLMs) has significant implications. Since they have revolutionized content creation on the web, they can lead to more unfair outcomes, lack of inclusivity, reinforcement of stereotypes and ethical and legal concerns. Notably, OpenAI has recently made claims they have introduced a new technique to ensure that DALL-E-2 generates images of people accurately reflect the diversity of the world's population. In order to investigate bias within the field of Software Engineering, the study utilized DALL-E-2 image generation to assess 56 tasks related to software engineering. Another objective was to determine the impact of OpenAI's new measures on the generated images for these specific tasks. Two sets of experiments were conducted. In one set, the tasks were prefixed with the clause "As a Software Engineer," while in the other set, only the tasks themselves were used. The tasks were presented in a gender-neutral manner, and the AI was instructed to generate images for each task 20 times. For a female-dominant task of doing administrative tasks, 40 more images were generated. The study revealed a large gender bias in the 2,280 images generated. For instance, in the subset of experiments with prompts explicitly incorporating the phrase "As a software engineer," only 2% of the generated images portrayed female protagonists. In all the images in this setting, male protagonists were dominant and in 45 tasks 100% of the protagonists were male. Notably, images generated without the prefixed clause only had more female protagonists in 'provide comments on project milestones' and 'provide enhancements', while other tasks did not exhibit a similar pattern. The findings emphasize unsuitability of implemented guardrails and the importance of further research on LLMs assessments. Further research is needed in LLMs to find out where their guardrails fail so companies can address them properly.

引用

页码：391 / 396

页数：6

共 50 条

[1] An Exploratory Evaluation of Large Language Models Using Empirical Software Engineering Tasks
Liang, Wenjun
Xiao, Guanping
[J]. PROCEEDINGS OF THE 15TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2024, 2024, : 31 - 40
[2] Application of Large Language Models to Software Engineering Tasks: Opportunities, Risks, and Implications
Ozkaya, Ipek
[J]. IEEE SOFTWARE, 2023, 40 (03) : 4 - 8
[3] Probing into the Fairness of Large Language Models: A Case Study of ChatGPT
Li, Yunqi
Zhang, Lanjing
Zhang, Yongfeng
[J]. 2024 58TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS, CISS, 2024,
[4] Challenges in applying large language models to requirements engineering tasks
Norheim, Johannes J.
Rebentisch, Eric
Xiao, Dekai
Draeger, Lorenz
Kerbrat, Alain
de Weck, Olivier L.
[J]. DESIGN SCIENCE, 2024, 10
[5] Do Pretrained Language Models Indeed Understand Software Engineering Tasks?
Li, Yao
Zhang, Tao
Luo, Xiapu
Cai, Haipeng
Fang, Sen
Yuan, Dawei
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (10) : 4639 - 4655
[6] Evaluating Explanations for Software Patches Generated by Large Language Models
Sobania, Dominik
Geiger, Alina
Callan, James
Brownlee, Alexander
Hanna, Carol
Moussa, Rebecca
Lopez, Mar Zamorano
Petke, Justyna
Sarro, Federica
[J]. SEARCH-BASED SOFTWARE ENGINEERING, SSBSE 2023, 2024, 14415 : 147 - 152
[7] Adopting Pre-trained Large Language Models for Regional Language Tasks: A Case Study
Gaikwad, Harsha
Kiwelekar, Arvind
Laddha, Manjushree
Shahare, Shashank
[J]. INTELLIGENT HUMAN COMPUTER INTERACTION, IHCI 2023, PT I, 2024, 14531 : 15 - 25
[8] Large Language Models for Software Engineering: A Systematic Literature Review
Hou, Xinyi
Zhao, Yanjie
Liu, Yue
Yang, Zhou
Wang, Kailong
Li, Li
Luo, Xiapu
Lo, David
Grundy, John
Wang, Haoyu
[J]. ACM Transactions on Software Engineering and Methodology, 2024, 33 (08)
[9] Large Language Models for Software Engineering: Survey and Open Problems
Fan, Angela
Gokkaya, Beliz
Harman, Mark
Lyubarskiy, Mitya
Sengupta, Shubho
Yoo, Shin
Zhang, Jie M.
[J]. 2023 IEEE/ACM INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: FUTURE OF SOFTWARE ENGINEERING, ICSE-FOSE, 2023, : 31 - 53
[10] Why Large Language Models will (not) Kill Software Engineering Research
Di Penta, Massimiliano
[J]. PROCEEDINGS OF 2024 28TH INTERNATION CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2024, 2024, : 5 - 5

← 1 2 3 4 5 →