Large language models to identify social determinants of health in electronic health records

被引：52

作者：

Guevara, Marco ^{[1
,2
]}

Chen, Shan ^{[1
,2
]}

Thomas, Spencer ^{[1
,2
,3
]}

Chaunzwa, Tafadzwa L. ^{[1
,2
]}

Franco, Idalid ^{[2
]}

Kann, Benjamin H. ^{[1
,2
]}

Moningi, Shalini ^{[2
]}

Qian, Jack M. ^{[1
,2
]}

Goldstein, Madeleine ^{[4
]}

Harper, Susan ^{[4
]}

Aerts, Hugo J. W. L. ^{[1
,2
,5
,6
]}

Catalano, Paul J. ^{[7
,8
]}

Savova, Guergana K. ^{[3
]}

Mak, Raymond H. ^{[1
,2
]}

Bitterman, Danielle S. ^{[1
,2
]}

机构：

[1] Harvard Med Sch, Artificial Intelligence Med AIM Program, Mass Gen Brigham, Boston, MA 02115 USA

[2] Brigham & Womens Hosp, Dana Farber Canc Inst, Dept Radiat Oncol, Boston, MA 02115 USA

[3] Harvard Med Sch, Boston Childrens Hosp, Computat Hlth Informat Program, Boston, MA USA

[4] Dana Farber Canc Inst, Adult Resource Off, Boston, MA USA

[5] Maastricht Univ, Radiol & Nucl Med, GROW, Maastricht, Netherlands

[6] Maastricht Univ, CARIM, Maastricht, Netherlands

[7] Dana Farber Canc Inst, Dept Data Sci, Boston, MA USA

[8] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA USA

来源：

NPJ DIGITAL MEDICINE | 2024年 / 7卷 / 01期

基金：

欧洲研究理事会;

关键词：

ADVERSE CHILDHOOD EXPERIENCES; UNITED-STATES; SUPPORT; MORTALITY; SURVIVAL; WOMEN;

D O I：

10.1038/s41746-023-00970-0

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

Social determinants of health (SDoH) play a critical role in patient outcomes, yet their documentation is often missing or incomplete in the structured data of electronic health records (EHRs). Large language models (LLMs) could enable high-throughput extraction of SDoH from the EHR to support research and clinical care. However, class imbalance and data limitations present challenges for this sparsely documented yet critical information. Here, we investigated the optimal methods for using LLMs to extract six SDoH categories from narrative text in the EHR: employment, housing, transportation, parental status, relationship, and social support. The best-performing models were fine-tuned Flan-T5 XL for any SDoH mentions (macro-F1 0.71), and Flan-T5 XXL for adverse SDoH mentions (macro-F1 0.70). Adding LLM-generated synthetic data to training varied across models and architecture, but improved the performance of smaller Flan-T5 models (delta F1 + 0.12 to +0.23). Our best-fine-tuned models outperformed zero- and few-shot performance of ChatGPT-family models in the zero- and few-shot setting, except GPT4 with 10-shot prompting for adverse SDoH. Fine-tuned models were less likely than ChatGPT to change their prediction when race/ethnicity and gender descriptors were added to the text, suggesting less algorithmic bias (p < 0.05). Our models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured 2.0%. These results demonstrate the potential of LLMs in improving real-world evidence on SDoH and assisting in identifying patients who could benefit from resource support.

引用

页数：14

共 50 条

[1] Large language models to identify social determinants of health in electronic health records
Marco Guevara
Shan Chen
Spencer Thomas
Tafadzwa L. Chaunzwa
Idalid Franco
Benjamin H. Kann
Shalini Moningi
Jack M. Qian
Madeleine Goldstein
Susan Harper
Hugo J. W. L. Aerts
Paul J. Catalano
Guergana K. Savova
Raymond H. Mak
Danielle S. Bitterman
npj Digital Medicine, 7
[2] Evaluation of a Natural Language Processing Approach to Identify Social Determinants of Health in Electronic Health Records in a Diverse Community Cohort
Rouillard, Christopher J.
Nasser, Mahmoud A.
Hu, Haihong
Roblin, Douglas W.
MEDICAL CARE, 2022, 60 (03) : 248 - 255
[3] Natural language processing to identify social determinants of health in Alzheimer's disease and related dementia from electronic health records
Wu, Wenbo
Holkeboer, Kaes J.
Kolawole, Temidun O.
Carbone, Lorrie
Mahmoudi, Elham
HEALTH SERVICES RESEARCH, 2023, 58 (06) : 1292 - 1302
[4] The shaky foundations of large language models and foundation models for electronic health records
Michael Wornow
Yizhe Xu
Rahul Thapa
Birju Patel
Ethan Steinberg
Scott Fleming
Michael A. Pfeffer
Jason Fries
Nigam H. Shah
npj Digital Medicine, 6
[5] The shaky foundations of large language models and foundation models for electronic health records
Wornow, Michael
Xu, Yizhe
Thapa, Rahul
Patel, Birju
Steinberg, Ethan
Fleming, Scott
Pfeffer, Michael A.
Fries, Jason
Shah, Nigam H.
NPJ DIGITAL MEDICINE, 2023, 6 (01)
[6] Adding Personal and Social Determinants of Health to Electronic Health Records
Weissman, Myrna
Talati, Ardesheer
Pathak, Jyotishman
BIOLOGICAL PSYCHIATRY, 2020, 87 (09) : S69 - S70
[7] Social determinants of health: Data standardization in electronic health records
Cummins, Mollie R.
Hardiker, Nicholas
Wang, Jing
Wilson, Marisa
Sward, Katherine
Chernecky, Cynthia
Roberts, Darryl
Langford, Laura Heermann
NURSING OUTLOOK, 2022, 70 (03) : 528 - 534
[8] Integrating Data On Social Determinants Of Health Into Electronic Health Records
Cantor, Michael N.
Thorpe, Lorna
HEALTH AFFAIRS, 2018, 37 (04) : 585 - 590
[9] A large language model for electronic health records
Xi Yang
Aokun Chen
Nima PourNejatian
Hoo Chang Shin
Kaleb E. Smith
Christopher Parisien
Colin Compas
Cheryl Martin
Anthony B. Costa
Mona G. Flores
Ying Zhang
Tanja Magoc
Christopher A. Harle
Gloria Lipori
Duane A. Mitchell
William R. Hogan
Elizabeth A. Shenkman
Jiang Bian
Yonghui Wu
npj Digital Medicine, 5
[10] A large language model for electronic health records
Yang, Xi
Chen, Aokun
PourNejatian, Nima
Shin, Hoo Chang
Smith, Kaleb E.
Parisien, Christopher
Compas, Colin
Martin, Cheryl
Costa, Anthony B.
Flores, Mona G.
Zhang, Ying
Magoc, Tanja
Harle, Christopher A.
Lipori, Gloria
Mitchell, Duane A.
Hogan, William R.
Shenkman, Elizabeth A.
Bian, Jiang
Wu, Yonghui
NPJ DIGITAL MEDICINE, 2022, 5 (01)

← 1 2 3 4 5 →