Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis

被引:6
|
作者
Rybinski, Maciej [1 ]
Dai, Xiang [1 ,2 ]
Singh, Sonit [1 ,3 ]
Karimi, Sarvnaz [1 ]
Nguyen, Anthony [4 ]
机构
[1] Commonwealth Sci & Ind Res Org, Sydney, NSW, Australia
[2] Univ Sydney, Sydney, NSW, Australia
[3] Macquarie Univ, Sydney, NSW, Australia
[4] Commonwealth Sci & Ind Res Org, Brisbane, Qld, Australia
关键词
information extraction; natural language processing; clinical natural language processing; named entity recognition; sequence tagging; neural language modeling; data augmentation;
D O I
10.2196/24020
中图分类号
R-058 [];
学科分类号
摘要
Background: The prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free text of clinical notes. Objective: The aim of this study is to develop automated methods that enable access to FH data through natural language processing. Methods: We performed information extraction by using transformers to extract disease mentions from notes. We also experimented with rule-based methods for extracting family member (FM) information from text and coreference resolution techniques. We evaluated different transfer learning strategies to improve the annotation of diseases. We provided a thorough error analysis of the contributing factors that affect such information extraction systems. Results: Our experiments showed that the combination of domain-adaptive pretraining and intermediate-task pretraining achieved an F1 score of 81.63% for the extraction of diseases and FMs from notes when it was tested on a public shared task data set from the National Natural Language Processing Clinical Challenges (N2C2), providing a statistically significant improvement over the baseline (P<.001). In comparison, in the 2019 N2C2/Open Health Natural Language Processing Shared Task, the median F1 score of all 17 participating teams was 76.59%. Conclusions: Our approach, which leverages a state-of-the-art named entity recognition model for disease mention detection coupled with a hybrid method for FM mention detection, achieved an effectiveness that was close to that of the top 3 systems participating in the 2019 N2C2 FH extraction challenge, with only the top system convincingly outperforming our approach in terms of precision.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis (vol 9, e24020, 2021)
    Rybinski, Maciej
    Dai, Xiang
    Singh, Sonit
    Karimi, Sarvnaz
    Nguyen, Anthony
    [J]. JMIR MEDICAL INFORMATICS, 2021, 9 (05)
  • [2] Extracting social determinants of health from electronic health records using natural language processing: a systematic review
    Patra, Braja G.
    Sharma, Mohit M.
    Vekaria, Veer
    Adekkanattu, Prakash
    Patterson, Olga, V
    Glicksberg, Benjamin
    Lepow, Lauren A.
    Ryu, Euijung
    Biernacka, Joanna M.
    Furmanchuk, Al'ona
    George, Thomas J.
    Hogan, William
    Wu, Yonghui
    Yang, Xi
    Bian, Jiang
    Weissman, Myrna
    Wickramaratne, Priya
    Mann, J. John
    Olfson, Mark
    Campion, Thomas R., Jr.
    Weiner, Mark
    Pathak, Jyotishman
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2021, 28 (12) : 2716 - 2727
  • [3] Natural language processing systems for extracting information from electronic health records about activities of daily living. A systematic review
    Wieland-Jorna, Yvonne
    van Kooten, Daan
    Verheij, Robert A.
    de Man, Yvonne
    Francke, Anneke L.
    Oosterveld-Vlug, Mariska G.
    [J]. JAMIA OPEN, 2024, 7 (02)
  • [4] ARTERIAL: A Natural Language Processing Model for Prevention of Information Leakage from Electronic Health Records
    Goldschmidt, Guilherme
    Zeiser, Felipe Andre
    Righi, Rodrigo da Rosa
    da Costa, Cristiano Andre
    [J]. 2023 XIII BRAZILIAN SYMPOSIUM ON COMPUTING SYSTEMS ENGINEERING, SBESC, 2023,
  • [5] Identifying Information Gaps in Electronic Health Records by Using Natural Language Processing: Gynecologic Surgery History Identification
    Moon, Sungrim
    Carlson, Luke A.
    Moser, Ethan D.
    Kshatriya, Bhavani Singh Agnikula
    Smith, Carin Y.
    Rocca, Walter A.
    Rocca, Liliana Gazzuola
    Bielinski, Suzette J.
    Liu, Hongfang
    Larson, Nicholas B.
    [J]. JOURNAL OF MEDICAL INTERNET RESEARCH, 2022, 24 (01)
  • [6] The Use of Natural Language Processing to Transform Health Records Information
    Roberts, A.
    [J]. EUROPEAN PSYCHIATRY, 2015, 30
  • [7] NATURAL LANGUAGE PROCESSING METHODS ENHANCE MACE IDENTIFICATION FROM ELECTRONIC HEALTH RECORDS
    St Laurent, S.
    Guo, M.
    Alfonso, R.
    Okoro, T.
    Johansen, K.
    Dember, L.
    Lindsay, A.
    [J]. VALUE IN HEALTH, 2018, 21 : S217 - S217
  • [8] Ascertainment of Delirium Status Using Natural Language Processing From Electronic Health Records
    Fu, Sunyang
    Lopes, Guilherme S.
    Pagali, Sandeep R.
    Thorsteinsdottir, Bjoerg
    LeBrasseur, Nathan K.
    Wen, Andrew
    Liu, Hongfang
    Rocca, Walter A.
    Olson, Janet E.
    St Sauver, Jennifer
    Sohn, Sunghwan
    [J]. JOURNALS OF GERONTOLOGY SERIES A-BIOLOGICAL SCIENCES AND MEDICAL SCIENCES, 2022, 77 (03): : 524 - 530
  • [9] Using Natural Language Processing to Predict Risk in Electronic Health Records
    Duy Van Le
    Montgomery, James
    Kirkby, Kenneth
    Scanlan, Joel
    [J]. MEDINFO 2023 - THE FUTURE IS ACCESSIBLE, 2024, 310 : 574 - 578
  • [10] Natural language processing for cognitive therapy: Extracting schemas from thought records
    Burger, Franziska
    Neerincx, Mark A.
    Brinkman, Willem-Paul
    [J]. PLOS ONE, 2021, 16 (10):