What's in a Name? Experimental Evidence of Gender Bias in Recommendation Letters Generated by ChatGPT

被引:6
|
作者
Kaplan, Deanna M. [1 ,7 ]
Palitsky, Roman [2 ]
Alvarez, Santiago J. Arconada [3 ]
Pozzo, Nicole S. [1 ]
Greenleaf, N. [3 ]
Atkinson, Ciara A. [4 ]
Lam, Wilbur A. [5 ,6 ]
机构
[1] Emory Univ, Sch Med, Dept Family & Prevent Med, Atlanta, GA 30329 USA
[2] Emory Univ, Woodruff Hlth Sci Ctr, Emory Spiritual Hlth, Atlanta, GA 30329 USA
[3] Emory Univ, Sch Med, Atlanta, GA USA
[4] Univ Arizona, Dept Campus Recreat, Tucson, AZ USA
[5] Georgia Inst Technol, Wallace Coulter Dept Biomed Engn H, Atlanta, GA USA
[6] Emory Univ, Atlanta, GA USA
[7] Emory Univ, Sch Med, Dept Family & Prevent Med, Adm Off, Wesley Woods Campus,1841 Clifton Rd NE,5th Floor, Atlanta, GA 30329 USA
关键词
chatbot; generative artificial intelligence; generative AI; gender bias; large language models; letters of recommendation; recommendation letter; language model; chatbots; artificial intelligence; AI; gender-based language; human written; real-world; scenario; STEREOTYPES; FEMALE;
D O I
10.2196/51837
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Artificial intelligence chatbots such as ChatGPT (OpenAI) have garnered excitement about their potential for delegating writing tasks ordinarily performed by humans. Many of these tasks (eg, writing recommendation letters) have social and professional ramifications, making the potential social biases in ChatGPT's underlying language model a serious concern. Objective: Three preregistered studies used the text analysis program Linguistic Inquiry and Word Count to investigate gender bias in recommendation letters written by ChatGPT in human -use sessions (N=1400 total letters). Methods: We conducted analyses using 22 existing Linguistic Inquiry and Word Count dictionaries, as well as 6 newly created dictionaries based on systematic reviews of gender bias in recommendation letters, to compare recommendation letters generated for the 200 most historically popular "male" and "female" names in the United States. Study 1 used 3 different letter -writing prompts intended to accentuate professional accomplishments associated with male stereotypes, female stereotypes, or neither. Study 2 examined whether lengthening each of the 3 prompts while holding the between -prompt word count constant modified the extent of bias. Study 3 examined the variability within letters generated for the same name and prompts. We hypothesized that when prompted with gender -stereotyped professional accomplishments, ChatGPT would evidence gender -based language differences replicating those found in systematic reviews of human -written recommendation letters (eg, more affiliative, social, and communal language for female names; more agentic and skill -based language for male names). Results: Significant differences in language between letters generated for female versus male names were observed across all prompts, including the prompt hypothesized to be neutral, and across nearly all language categories tested. Historically female names received significantly more social referents (5/6, 83% of prompts), communal or doubt -raising language (4/6, 67% of prompts), personal pronouns (4/6, 67% of prompts), and clout language (5/6, 83% of prompts). Contradicting the study hypotheses, some gender differences (eg, achievement language and agentic language) were significant in both the hypothesized and nonhypothesized directions, depending on the prompt. Heteroscedasticity between male and female names was observed in multiple linguistic categories, with greater variance for historically female names than for historically male names. Conclusions: ChatGPT reproduces many gender -based language biases that have been reliably identified in investigations of human -written reference letters, although these differences vary across prompts and language categories. Caution should be taken when using ChatGPT for tasks that have social consequences, such as reference letter writing. The methods developed in this study may be useful for ongoing bias testing among progressive generations of chatbots across a range of real -world scenarios. Trial Registration: OSF Registries osf.io/ztv96; https://osf.io/ztv96
引用
收藏
页数:14
相关论文
共 50 条
  • [41] What do (Cicero's) letters count as evidence for?
    Riggsby, Andrew
    HERMATHENA, 2017, (202) : 265 - 284
  • [42] Identifying Gender and Racial Bias in Pediatric Fellowship Letters of Recommendation: Do Word Choices Influence Interview Decisions?
    Boolchandani, Henna
    Chen, Laura
    Elder, Robert W.
    Osborn, Rachel
    Phatak, Uma P.
    Puthenpura, Vidya
    Sheares, Beverley J.
    Tiyyagura, Gunjan
    Amster, Leah
    Lee, Seohyuk
    Langhan, Melissa L.
    JOURNAL OF PEDIATRICS, 2024, 265
  • [44] 'What's in a name?' The discursive construction of gender identity over time
    Aboim, Sofia
    JOURNAL OF GENDER STUDIES, 2023, 32 (07) : 641 - 654
  • [45] What's in a name? Evidence of transgender labor discrimination in Mexico
    Martinez-Alfaro, Alejandra
    Silverio-Murillo, Adan
    Balmori-de-la-Miyar, Jose
    JOURNAL OF ECONOMIC BEHAVIOR & ORGANIZATION, 2024, 227
  • [46] What's in a Name? The Effects of the Labels "Fat" Versus "Overweight" on Weight Bias
    Brochu, Paula M.
    Esses, Victoria M.
    JOURNAL OF APPLIED SOCIAL PSYCHOLOGY, 2011, 41 (08) : 1981 - 2008
  • [47] What's in a Name? Reducing Bias in Bios without Access to Protected Attributes
    Romanov, Alexey
    De-Arteaga, Maria
    Wallach, Hanna
    Chayes, Jennifer
    Borgs, Christian
    Chouldechova, Alexandra
    Geyik, Sahin
    Kenthapadi, Krishnaram
    Rumshisky, Anna
    Kalai, Adam Tauman
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4187 - 4195
  • [48] What's in a Name? Implicit Bias Affects Patient Perception of Surgeon Skill
    Bhat, Deepa
    Kollu, Tejas
    Ricci, Joseph A.
    Patel, Ashit
    PLASTIC AND RECONSTRUCTIVE SURGERY, 2021, 147 (06) : 948E - 956E
  • [49] Reply to the Letter to the Editor: How Prominent Are Gender Bias, Racial Bias, and Score Inflation in Orthopaedic Surgery Residency Recommendation Letters? A Systematic Review
    Burkhart, Robert J.
    Lavu, Monish S.
    Hecht II, Christian J.
    Ina, Jason G.
    Gillespie, Robert J.
    Liu, Raymond W.
    CLINICAL ORTHOPAEDICS AND RELATED RESEARCH, 2024, 482 (09) : 1730 - 1731
  • [50] Publication Bias and the Validity of Evidence: What's the Connection?
    Bialystok, Ellen
    Kroll, Judith F.
    Green, David W.
    MacWhinney, Brian
    Craik, Fergus I. M.
    PSYCHOLOGICAL SCIENCE, 2015, 26 (06) : 944 - 946