Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing

被引:38
|
作者
Han, Sifei [1 ]
Zhang, Robert F. [1 ,6 ]
Shi, Lingyun [1 ]
Richie, Russell [1 ]
Liu, Haixia [2 ]
Tseng, Andrew [3 ]
Quan, Wei [4 ]
Ryan, Neal [5 ]
Brent, David [5 ]
Tsui, Fuchiang R. [1 ,6 ]
机构
[1] Childrens Hosp Philadelphia, Dept Biomed & Hlth Informat, Tsui Lab, Philadelphia, PA 19104 USA
[2] Cent South Univ, Changsha, Hunan, Peoples R China
[3] Touro Univ Nevada, Henderson, NV USA
[4] New York Univ Abu Dhabi, Abu Dhabi, U Arab Emirates
[5] Univ Pittsburgh, Dept Psychiat, Pittsburgh, PA USA
[6] Univ Penn, Perelman Sch Med, Philadelphia, PA 19104 USA
基金
美国安德鲁·梅隆基金会;
关键词
Social determinants of health; Natural language processing; Deep learning; Electronic health records; MODEL; PREDICTION;
D O I
10.1016/j.jbi.2021.103984
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: Social determinants of health (SDOH) are non-medical factors that can profoundly impact patient health outcomes. However, SDOH are rarely available in structured electronic health record (EHR) data such as diagnosis codes, and more commonly found in unstructured narrative clinical notes. Hence, identifying social context from unstructured EHR data has become increasingly important. Yet, previous work on using natural language processing to automate extraction of SDOH from text (a) usually focuses on an ad hoc selection of SDOH, and (b) does not use the latest advances in deep learning. Our objective was to advance automatic extraction of SDOH from clinical text by (a) systematically creating a set of SDOH based on standard biomedical and psychiatric ontologies, and (b) training state-of-the-art deep neural networks to extract mentions of these SDOH from clinical notes. Design: A retrospective cohort study. Setting and participants: Data were extracted from the Medical Information Mart for Intensive Care (MIMIC-III) database. The corpus comprised 3,504 social related sentences from 2,670 clinical notes. Methods: We developed a framework for automated classification of multiple SDOH categories. Our dataset comprised narrative clinical notes under the "Social Work" category in the MIMIC-III Clinical Database. Using standard terminologies, SNOMED-CT and DSM-IV, we systematically curated a set of 13 SDOH categories and created annotation guidelines for these. After manually annotating the 3,504 sentences, we developed and tested three deep neural network (DNN) architectures - convolutional neural network (CNN), long short-term memory (LSTM) network, and the Bidirectional Encoder Representations from Transformers (BERT) - for automated detection of eight SDOH categories. We also compared these DNNs to three baselines models: (1) cTAKES, as well as (2) L2-regularized logistic regression and (3) random forests on bags-of-words. Model evaluation metrics included micro- and macro- F1, and area under the receiver operating characteristic curve (AUC). Results: All three DNN models accurately classified all SDOH categories (minimum micro-F1 = 0.632, minimum macro-AUC = 0.854). Compared to the CNN and LSTM, BERT performed best in most key metrics (micro-F1 = 0.690, macro-AUC = 0.907). The BERT model most effectively identified the "occupational" category (F1 = 0.774, AUC = 0.965) and least effectively identified the "non-SDOH" category (F = 0.491, AUC = 0.788). BERT outperformed cTAKES in distinguishing social vs non-social sentences (BERT F1 = 0.87 vs. cTAKES F1 = 0.06), and outperformed logistic regression (micro-F1 = 0.649, macro-AUC = 0.696) and random forest (micro-F1 = 0.502, macro-AUC = 0.523) trained on bag-of-words. Conclusions: Our study framework with DNN models demonstrated improved performance for efficiently identifying a systematic range of SDOH categories from clinical notes in the EHR. Improved identification of patient SDOH may further improve healthcare outcomes.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Deep Learning-Based Natural Language Processing to Automate Esophagitis Severity Grading from the Electronic Health Records
    Chen, S.
    Guevara, M.
    Ramirez, N.
    Aerts, H.
    Miller, T. A.
    Savova, G. K.
    Mak, R. H.
    Bitterman, D. S.
    [J]. INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2023, 117 (02): : S18 - S18
  • [2] Extracting social determinants of health from electronic health records using natural language processing: a systematic review
    Patra, Braja G.
    Sharma, Mohit M.
    Vekaria, Veer
    Adekkanattu, Prakash
    Patterson, Olga, V
    Glicksberg, Benjamin
    Lepow, Lauren A.
    Ryu, Euijung
    Biernacka, Joanna M.
    Furmanchuk, Al'ona
    George, Thomas J.
    Hogan, William
    Wu, Yonghui
    Yang, Xi
    Bian, Jiang
    Weissman, Myrna
    Wickramaratne, Priya
    Mann, J. John
    Olfson, Mark
    Campion, Thomas R., Jr.
    Weiner, Mark
    Pathak, Jyotishman
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2021, 28 (12) : 2716 - 2727
  • [3] Keyword Extraction Algorithm for Classifying Smoking Status from Unstructured Bilingual Electronic Health Records Based on Natural Language Processing
    Bae, Ye Seul
    Kim, Kyung Hwan
    Kim, Han Kyul
    Choi, Sae Won
    Ko, Taehoon
    Seo, Hee Hwa
    Lee, Hae-Young
    Jeon, Hyojin
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (19):
  • [4] RETRACTED ARTICLE: Analysis of Electronic Health Records Based on Deep Learning with Natural Language Processing
    Yi-Cheng Shen
    Te-Chun Hsia
    Ching-Hsien Hsu
    [J]. Arabian Journal for Science and Engineering, 2023, 48 : 2597 - 2597
  • [5] Neural Natural Language Processing for unstructured data in electronic health records: A review
    Li, Irene
    Pan, Jessica
    Goldwasser, Jeremy
    Verma, Neha
    Wong, Wai Pan
    Nuzumlali, Muhammed Yavuz
    Rosand, Benjamin
    Li, Yixin
    Zhang, Matthew
    Chang, David
    Taylor, R. Andrew
    Krumholz, Harlan M.
    Radev, Dragomir
    [J]. COMPUTER SCIENCE REVIEW, 2022, 46
  • [6] Deep Learning Approaches for Predicting Glaucoma Progression Using Electronic Health Records and Natural Language Processing
    Wang, Sophia Y.
    Tseng, Benjamin
    Hernandez-Boussard, Tina
    [J]. OPHTHALMOLOGY SCIENCE, 2022, 2 (02):
  • [7] RETRACTED: Analysis of Electronic Health Records Based on Deep Learning with Natural Language Processing (Retracted Article)
    Shen, Yi-Cheng
    Hsia, Te-Chun
    Hsu, Ching-Hsien
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (02) : 2597 - 2597
  • [8] Natural language processing to identify social determinants of health in Alzheimer's disease and related dementia from electronic health records
    Wu, Wenbo
    Holkeboer, Kaes J.
    Kolawole, Temidun O.
    Carbone, Lorrie
    Mahmoudi, Elham
    [J]. HEALTH SERVICES RESEARCH, 2023, 58 (06) : 1292 - 1302
  • [9] Evaluation of a Natural Language Processing Approach to Identify Social Determinants of Health in Electronic Health Records in a Diverse Community Cohort
    Rouillard, Christopher J.
    Nasser, Mahmoud A.
    Hu, Haihong
    Roblin, Douglas W.
    [J]. MEDICAL CARE, 2022, 60 (03) : 248 - 255
  • [10] Using Natural Language Processing and Machine Learning to Identify Incident Stroke From Electronic Health Records
    Zhao, Yiqing
    Fu, Sunyang
    Bielinski, Suzette J.
    Decker, Paul
    Chamberlain, Alanna M.
    Roger, Veronique L.
    Liu, Hongfang
    Larson, Nicolas B.
    [J]. CIRCULATION, 2020, 141