Large language models to identify social determinants of health in electronic health records

被引:0
|
作者
Marco Guevara
Shan Chen
Spencer Thomas
Tafadzwa L. Chaunzwa
Idalid Franco
Benjamin H. Kann
Shalini Moningi
Jack M. Qian
Madeleine Goldstein
Susan Harper
Hugo J. W. L. Aerts
Paul J. Catalano
Guergana K. Savova
Raymond H. Mak
Danielle S. Bitterman
机构
[1] Mass General Brigham,Artificial Intelligence in Medicine (AIM) Program
[2] Harvard Medical School,Department of Radiation Oncology
[3] Brigham and Women’s Hospital/Dana-Farber Cancer Institute,Computational Health Informatics Program
[4] Boston Children’s Hospital,Adult Resource Office
[5] Harvard Medical School,Radiology and Nuclear Medicine, GROW & CARIM
[6] Dana-Farber Cancer Institute,Department of Data Science
[7] Maastricht University,undefined
[8] Dana-Farber Cancer Institute and Department of Biostatistics,undefined
[9] Harvard T. H. Chan School of Public Health,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Social determinants of health (SDoH) play a critical role in patient outcomes, yet their documentation is often missing or incomplete in the structured data of electronic health records (EHRs). Large language models (LLMs) could enable high-throughput extraction of SDoH from the EHR to support research and clinical care. However, class imbalance and data limitations present challenges for this sparsely documented yet critical information. Here, we investigated the optimal methods for using LLMs to extract six SDoH categories from narrative text in the EHR: employment, housing, transportation, parental status, relationship, and social support. The best-performing models were fine-tuned Flan-T5 XL for any SDoH mentions (macro-F1 0.71), and Flan-T5 XXL for adverse SDoH mentions (macro-F1 0.70). Adding LLM-generated synthetic data to training varied across models and architecture, but improved the performance of smaller Flan-T5 models (delta F1 + 0.12 to +0.23). Our best-fine-tuned models outperformed zero- and few-shot performance of ChatGPT-family models in the zero- and few-shot setting, except GPT4 with 10-shot prompting for adverse SDoH. Fine-tuned models were less likely than ChatGPT to change their prediction when race/ethnicity and gender descriptors were added to the text, suggesting less algorithmic bias (p < 0.05). Our models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured 2.0%. These results demonstrate the potential of LLMs in improving real-world evidence on SDoH and assisting in identifying patients who could benefit from resource support.
引用
收藏
相关论文
共 50 条
  • [31] Natural language generation for electronic health records
    Lee, Scott H.
    [J]. NPJ DIGITAL MEDICINE, 2018, 1
  • [32] Natural language generation for electronic health records
    Scott H. Lee
    [J]. npj Digital Medicine, 1
  • [33] The Limits of 'Social Determinants of Health' Language
    Walton, AnnMarie Lee
    [J]. AMERICAN JOURNAL OF NURSING, 2023, 123 (01) : 11 - 11
  • [34] Target-based fusion using social determinants of health to enhance suicide prediction with electronic health records
    Sacco, Shane J.
    Chen, Kun
    Wang, Fei
    Aseltine, Robert
    [J]. PLOS ONE, 2023, 18 (04):
  • [35] Incorporating Social Determinants of Health in Electronic Health Records: Qualitative Study of Current Practices Among Top Vendors
    Freij, Maysoun
    Dullabh, Prashila
    Lewis, Sarah
    Smith, Scott R.
    Hovey, Lauren
    Dhopeshwarkar, Rina
    [J]. JMIR MEDICAL INFORMATICS, 2019, 7 (02) : 149 - 160
  • [36] Identifying Patient-Level Social Determinants of Health in Unstructured Clinical Notes From Electronic Health Records
    Nikbakht, M.
    Kumar, V
    Rasouliyan, L.
    [J]. VALUE IN HEALTH, 2022, 25 (12) : S460 - S461
  • [37] Electronic health records identify timely trends in childhood mental health conditions
    Josephine Elia
    Kathleen Pajer
    Raghuram Prasad
    Andres Pumariega
    Mitchell Maltenfort
    Levon Utidjian
    Elizabeth Shenkman
    Kelly Kelleher
    Suchitra Rao
    Peter A. Margolis
    Dimitri A. Christakis
    Antonio Y. Hardan
    Rachel Ballard
    Christopher B. Forrest
    [J]. Child and Adolescent Psychiatry and Mental Health, 17
  • [38] Electronic health records identify timely trends in childhood mental health conditions
    Elia, Josephine
    Pajer, Kathleen
    Prasad, Raghuram
    Pumariega, Andres
    Maltenfort, Mitchell
    Utidjian, Levon
    Shenkman, Elizabeth
    Kelleher, Kelly
    Rao, Suchitra
    Margolis, Peter A.
    Christakis, Dimitri A.
    Hardan, Antonio Y.
    Ballard, Rachel
    Forrest, Christopher B.
    [J]. CHILD AND ADOLESCENT PSYCHIATRY AND MENTAL HEALTH, 2023, 17 (01)
  • [39] Documentation of social determinants in electronic health records with and without standardized terminologies: A comparative study
    Monsen, Karen A.
    Rudenick, Joyce M.
    Kapinos, Nicole
    Warmbold, Kathryn
    McMahon, Siobhan K.
    Schorr, Erica N.
    [J]. PROCEEDINGS OF SINGAPORE HEALTHCARE, 2019, 28 (01) : 39 - 47
  • [40] Leveraging natural language processing to augment structured social determinants of health data in the electronic health record
    Lybarger, Kevin
    Dobbins, Nicholas J.
    Long, Ritche
    Singh, Angad
    Wedgeworth, Patrick
    Uzuner, Ozlem
    Yetisgen, Meliha
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2023, 30 (08) : 1389 - 1397