Large language models to identify social determinants of health in electronic health records

被引:52
|
作者
Guevara, Marco [1 ,2 ]
Chen, Shan [1 ,2 ]
Thomas, Spencer [1 ,2 ,3 ]
Chaunzwa, Tafadzwa L. [1 ,2 ]
Franco, Idalid [2 ]
Kann, Benjamin H. [1 ,2 ]
Moningi, Shalini [2 ]
Qian, Jack M. [1 ,2 ]
Goldstein, Madeleine [4 ]
Harper, Susan [4 ]
Aerts, Hugo J. W. L. [1 ,2 ,5 ,6 ]
Catalano, Paul J. [7 ,8 ]
Savova, Guergana K. [3 ]
Mak, Raymond H. [1 ,2 ]
Bitterman, Danielle S. [1 ,2 ]
机构
[1] Harvard Med Sch, Artificial Intelligence Med AIM Program, Mass Gen Brigham, Boston, MA 02115 USA
[2] Brigham & Womens Hosp, Dana Farber Canc Inst, Dept Radiat Oncol, Boston, MA 02115 USA
[3] Harvard Med Sch, Boston Childrens Hosp, Computat Hlth Informat Program, Boston, MA USA
[4] Dana Farber Canc Inst, Adult Resource Off, Boston, MA USA
[5] Maastricht Univ, Radiol & Nucl Med, GROW, Maastricht, Netherlands
[6] Maastricht Univ, CARIM, Maastricht, Netherlands
[7] Dana Farber Canc Inst, Dept Data Sci, Boston, MA USA
[8] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA USA
基金
欧洲研究理事会;
关键词
ADVERSE CHILDHOOD EXPERIENCES; UNITED-STATES; SUPPORT; MORTALITY; SURVIVAL; WOMEN;
D O I
10.1038/s41746-023-00970-0
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Social determinants of health (SDoH) play a critical role in patient outcomes, yet their documentation is often missing or incomplete in the structured data of electronic health records (EHRs). Large language models (LLMs) could enable high-throughput extraction of SDoH from the EHR to support research and clinical care. However, class imbalance and data limitations present challenges for this sparsely documented yet critical information. Here, we investigated the optimal methods for using LLMs to extract six SDoH categories from narrative text in the EHR: employment, housing, transportation, parental status, relationship, and social support. The best-performing models were fine-tuned Flan-T5 XL for any SDoH mentions (macro-F1 0.71), and Flan-T5 XXL for adverse SDoH mentions (macro-F1 0.70). Adding LLM-generated synthetic data to training varied across models and architecture, but improved the performance of smaller Flan-T5 models (delta F1 + 0.12 to +0.23). Our best-fine-tuned models outperformed zero- and few-shot performance of ChatGPT-family models in the zero- and few-shot setting, except GPT4 with 10-shot prompting for adverse SDoH. Fine-tuned models were less likely than ChatGPT to change their prediction when race/ethnicity and gender descriptors were added to the text, suggesting less algorithmic bias (p < 0.05). Our models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured 2.0%. These results demonstrate the potential of LLMs in improving real-world evidence on SDoH and assisting in identifying patients who could benefit from resource support.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Moving Electronic Medical Records Upstream Incorporating Social Determinants of Health
    Gottlieb, Laura M.
    Tirozzi, Karen J.
    Manchanda, Rishi
    Burns, Abby R.
    Sandel, Megan T.
    AMERICAN JOURNAL OF PREVENTIVE MEDICINE, 2015, 48 (02) : 215 - 218
  • [32] Social Determinants Documentation in Electronic Health Records With and Without Standardized Terminologies
    Monsen, Karen A.
    Kapinos, Nicole
    Rudenick, Joyce M.
    Warmbold, Kathryn
    McMahon, Siobhan K.
    Schorr, Erica N.
    WESTERN JOURNAL OF NURSING RESEARCH, 2016, 38 (10) : 1399 - 1400
  • [33] Social determinants of health in electronic health records and their impact on analysis and risk prediction: A systematic review
    Chen, Min
    Tan, Xuan
    Padman, Rema
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (11) : 1764 - 1773
  • [34] Quality of electronic health records: Variability of missing data for social determinants of health by healthcare systems
    Jaffe, Dena H.
    Ruo, Philip
    Montgomery, Sam
    Hoover, Chaundra
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2023, 32 : 278 - 279
  • [35] Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing
    Han, Sifei
    Zhang, Robert F.
    Shi, Lingyun
    Richie, Russell
    Liu, Haixia
    Tseng, Andrew
    Quan, Wei
    Ryan, Neal
    Brent, David
    Tsui, Fuchiang R.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 127
  • [36] Scalable information extraction from free text electronic health records using large language models
    Gu, Bowen
    Shao, Vivian
    Liao, Ziqian
    Carducci, Valentina
    Brufau, Santiago Romero
    Yang, Jie
    Desai, Rishi J.
    BMC MEDICAL RESEARCH METHODOLOGY, 2025, 25 (01)
  • [37] Using Natural Language Processing to Identify Different Lens Pathology in Electronic Health Records
    Stein, Joshua d.
    Zhou, Yunshu
    Andrews, Chris a.
    Kim, Judy e.
    Addis, Victoria
    Bixler, Jill
    Grove, Nathan
    Mcmillan, Brian
    Munir, Saleha z.
    Pershing, Suzann
    Schultz, Jeffrey s.
    Stagg, Brian c.
    Wang, Sophia y.
    Woreta, Fasika
    AMERICAN JOURNAL OF OPHTHALMOLOGY, 2024, 262 : 153 - 160
  • [38] SOCIAL NETWORKS IN ELECTRONIC HEALTH RECORDS
    Tu, Shin-Ping
    Yao, Nengliang
    Zhu, Xi
    Mishra, Vimal
    Phillips, Allison E.
    Dow, Alan
    JOURNAL OF GENERAL INTERNAL MEDICINE, 2016, 31 : S399 - S399
  • [39] A public health perspective on using electronic health records to address social determinants of health: The potential for a national system of local community health records in the United States
    Hatef, Elham
    Weiner, Jonathan P.
    Kharrazi, Hadi
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 124 : 86 - 89
  • [40] Natural language generation for electronic health records
    Lee, Scott H.
    NPJ DIGITAL MEDICINE, 2018, 1