Large language models to identify social determinants of health in electronic health records

被引:0
|
作者
Marco Guevara
Shan Chen
Spencer Thomas
Tafadzwa L. Chaunzwa
Idalid Franco
Benjamin H. Kann
Shalini Moningi
Jack M. Qian
Madeleine Goldstein
Susan Harper
Hugo J. W. L. Aerts
Paul J. Catalano
Guergana K. Savova
Raymond H. Mak
Danielle S. Bitterman
机构
[1] Mass General Brigham,Artificial Intelligence in Medicine (AIM) Program
[2] Harvard Medical School,Department of Radiation Oncology
[3] Brigham and Women’s Hospital/Dana-Farber Cancer Institute,Computational Health Informatics Program
[4] Boston Children’s Hospital,Adult Resource Office
[5] Harvard Medical School,Radiology and Nuclear Medicine, GROW & CARIM
[6] Dana-Farber Cancer Institute,Department of Data Science
[7] Maastricht University,undefined
[8] Dana-Farber Cancer Institute and Department of Biostatistics,undefined
[9] Harvard T. H. Chan School of Public Health,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Social determinants of health (SDoH) play a critical role in patient outcomes, yet their documentation is often missing or incomplete in the structured data of electronic health records (EHRs). Large language models (LLMs) could enable high-throughput extraction of SDoH from the EHR to support research and clinical care. However, class imbalance and data limitations present challenges for this sparsely documented yet critical information. Here, we investigated the optimal methods for using LLMs to extract six SDoH categories from narrative text in the EHR: employment, housing, transportation, parental status, relationship, and social support. The best-performing models were fine-tuned Flan-T5 XL for any SDoH mentions (macro-F1 0.71), and Flan-T5 XXL for adverse SDoH mentions (macro-F1 0.70). Adding LLM-generated synthetic data to training varied across models and architecture, but improved the performance of smaller Flan-T5 models (delta F1 + 0.12 to +0.23). Our best-fine-tuned models outperformed zero- and few-shot performance of ChatGPT-family models in the zero- and few-shot setting, except GPT4 with 10-shot prompting for adverse SDoH. Fine-tuned models were less likely than ChatGPT to change their prediction when race/ethnicity and gender descriptors were added to the text, suggesting less algorithmic bias (p < 0.05). Our models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured 2.0%. These results demonstrate the potential of LLMs in improving real-world evidence on SDoH and assisting in identifying patients who could benefit from resource support.
引用
收藏
相关论文
共 50 条
  • [1] Large language models to identify social determinants of health in electronic health records
    Guevara, Marco
    Chen, Shan
    Thomas, Spencer
    Chaunzwa, Tafadzwa L.
    Franco, Idalid
    Kann, Benjamin H.
    Moningi, Shalini
    Qian, Jack M.
    Goldstein, Madeleine
    Harper, Susan
    Aerts, Hugo J. W. L.
    Catalano, Paul J.
    Savova, Guergana K.
    Mak, Raymond H.
    Bitterman, Danielle S.
    [J]. NPJ DIGITAL MEDICINE, 2024, 7 (01)
  • [2] Evaluation of a Natural Language Processing Approach to Identify Social Determinants of Health in Electronic Health Records in a Diverse Community Cohort
    Rouillard, Christopher J.
    Nasser, Mahmoud A.
    Hu, Haihong
    Roblin, Douglas W.
    [J]. MEDICAL CARE, 2022, 60 (03) : 248 - 255
  • [3] Natural language processing to identify social determinants of health in Alzheimer's disease and related dementia from electronic health records
    Wu, Wenbo
    Holkeboer, Kaes J.
    Kolawole, Temidun O.
    Carbone, Lorrie
    Mahmoudi, Elham
    [J]. HEALTH SERVICES RESEARCH, 2023, 58 (06) : 1292 - 1302
  • [4] The shaky foundations of large language models and foundation models for electronic health records
    Michael Wornow
    Yizhe Xu
    Rahul Thapa
    Birju Patel
    Ethan Steinberg
    Scott Fleming
    Michael A. Pfeffer
    Jason Fries
    Nigam H. Shah
    [J]. npj Digital Medicine, 6
  • [5] The shaky foundations of large language models and foundation models for electronic health records
    Wornow, Michael
    Xu, Yizhe
    Thapa, Rahul
    Patel, Birju
    Steinberg, Ethan
    Fleming, Scott
    Pfeffer, Michael A.
    Fries, Jason
    Shah, Nigam H.
    [J]. NPJ DIGITAL MEDICINE, 2023, 6 (01)
  • [6] Adding Personal and Social Determinants of Health to Electronic Health Records
    Weissman, Myrna
    Talati, Ardesheer
    Pathak, Jyotishman
    [J]. BIOLOGICAL PSYCHIATRY, 2020, 87 (09) : S69 - S70
  • [7] Social determinants of health: Data standardization in electronic health records
    Cummins, Mollie R.
    Hardiker, Nicholas
    Wang, Jing
    Wilson, Marisa
    Sward, Katherine
    Chernecky, Cynthia
    Roberts, Darryl
    Langford, Laura Heermann
    [J]. NURSING OUTLOOK, 2022, 70 (03) : 528 - 534
  • [8] Integrating Data On Social Determinants Of Health Into Electronic Health Records
    Cantor, Michael N.
    Thorpe, Lorna
    [J]. HEALTH AFFAIRS, 2018, 37 (04) : 585 - 590
  • [9] A large language model for electronic health records
    Xi Yang
    Aokun Chen
    Nima PourNejatian
    Hoo Chang Shin
    Kaleb E. Smith
    Christopher Parisien
    Colin Compas
    Cheryl Martin
    Anthony B. Costa
    Mona G. Flores
    Ying Zhang
    Tanja Magoc
    Christopher A. Harle
    Gloria Lipori
    Duane A. Mitchell
    William R. Hogan
    Elizabeth A. Shenkman
    Jiang Bian
    Yonghui Wu
    [J]. npj Digital Medicine, 5
  • [10] A large language model for electronic health records
    Yang, Xi
    Chen, Aokun
    PourNejatian, Nima
    Shin, Hoo Chang
    Smith, Kaleb E.
    Parisien, Christopher
    Compas, Colin
    Martin, Cheryl
    Costa, Anthony B.
    Flores, Mona G.
    Zhang, Ying
    Magoc, Tanja
    Harle, Christopher A.
    Lipori, Gloria
    Mitchell, Duane A.
    Hogan, William R.
    Shenkman, Elizabeth A.
    Bian, Jiang
    Wu, Yonghui
    [J]. NPJ DIGITAL MEDICINE, 2022, 5 (01)