Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis

被引:6
|
作者
Rybinski, Maciej [1 ]
Dai, Xiang [1 ,2 ]
Singh, Sonit [1 ,3 ]
Karimi, Sarvnaz [1 ]
Nguyen, Anthony [4 ]
机构
[1] Commonwealth Sci & Ind Res Org, Sydney, NSW, Australia
[2] Univ Sydney, Sydney, NSW, Australia
[3] Macquarie Univ, Sydney, NSW, Australia
[4] Commonwealth Sci & Ind Res Org, Brisbane, Qld, Australia
关键词
information extraction; natural language processing; clinical natural language processing; named entity recognition; sequence tagging; neural language modeling; data augmentation;
D O I
10.2196/24020
中图分类号
R-058 [];
学科分类号
摘要
Background: The prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free text of clinical notes. Objective: The aim of this study is to develop automated methods that enable access to FH data through natural language processing. Methods: We performed information extraction by using transformers to extract disease mentions from notes. We also experimented with rule-based methods for extracting family member (FM) information from text and coreference resolution techniques. We evaluated different transfer learning strategies to improve the annotation of diseases. We provided a thorough error analysis of the contributing factors that affect such information extraction systems. Results: Our experiments showed that the combination of domain-adaptive pretraining and intermediate-task pretraining achieved an F1 score of 81.63% for the extraction of diseases and FMs from notes when it was tested on a public shared task data set from the National Natural Language Processing Clinical Challenges (N2C2), providing a statistically significant improvement over the baseline (P<.001). In comparison, in the 2019 N2C2/Open Health Natural Language Processing Shared Task, the median F1 score of all 17 participating teams was 76.59%. Conclusions: Our approach, which leverages a state-of-the-art named entity recognition model for disease mention detection coupled with a hybrid method for FM mention detection, achieved an effectiveness that was close to that of the top 3 systems participating in the 2019 N2C2 FH extraction challenge, with only the top system convincingly outperforming our approach in terms of precision.
引用
下载
收藏
页数:24
相关论文
共 50 条
  • [21] Natural Language Processing to Identify Lupus Nephritis Phenotype in Electronic Health Records
    Deng, Yu
    Pacheco, Jennifer
    Chung, Anh
    Mao, Chengsheng
    Smith, Joshua
    Zhao, Juan
    Wei, Wei-Qi
    Barnado, April
    Weng, Chunhua
    Liu, Cong
    Gordon, Adam
    Yu, Jingzhi
    Tedla, Yacob
    Kho, Abel
    Ramsey-Goldman, Rosalind
    Walunas, Theresa
    Luo, Yuan
    ARTHRITIS & RHEUMATOLOGY, 2021, 73 : 666 - 667
  • [22] Neural Natural Language Processing for unstructured data in electronic health records: A review
    Li, Irene
    Pan, Jessica
    Goldwasser, Jeremy
    Verma, Neha
    Wong, Wai Pan
    Nuzumlali, Muhammed Yavuz
    Rosand, Benjamin
    Li, Yixin
    Zhang, Matthew
    Chang, David
    Taylor, R. Andrew
    Krumholz, Harlan M.
    Radev, Dragomir
    COMPUTER SCIENCE REVIEW, 2022, 46
  • [23] Using natural language processing to extract clinically useful information from Chinese electronic medical records
    Chen, Liang
    Song, Liting
    Shao, Yue
    Li, Dewei
    Ding, Keyue
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 124 : 6 - 12
  • [24] DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records
    Savova, Guergana K.
    Tseytlin, Eugene
    Finan, Sean
    Castine, Melissa
    Miller, Timothy
    Medvedeva, Olga
    Harris, David
    Hochheiser, Harry
    Lin, Chen
    Chavan, Girish
    Jacobson, Rebecca S.
    CANCER RESEARCH, 2017, 77 (21) : E115 - E118
  • [25] Natural language generation for electronic health records
    Lee, Scott H.
    NPJ DIGITAL MEDICINE, 2018, 1
  • [26] Natural history of rare diseases using natural language processing of narrative unstructured electronic health records: The example of Dravet syndrome
    Lo Barco, Tommaso
    Garcelon, Nicolas
    Neuraz, Antoine
    Nabbout, Rima
    EPILEPSIA, 2024, 65 (02) : 350 - 361
  • [27] Natural language generation for electronic health records
    Scott H. Lee
    npj Digital Medicine, 1
  • [28] Examining Linguistic Differences in Electronic Health Records for Diverse Patients With Diabetes: Natural Language Processing Analysis
    Bilotta, Isabel
    Tonidandel, Scott
    Liaw, Winston R.
    King, Eden
    Carvajal, Diana N.
    Taylor, Ayana
    Thamby, Julie
    Xiang, Yang
    Tao, Cui
    Hansen, Michael
    JMIR MEDICAL INFORMATICS, 2024, 12
  • [29] RETRACTED: Analysis of Electronic Health Records Based on Deep Learning with Natural Language Processing (Retracted Article)
    Shen, Yi-Cheng
    Hsia, Te-Chun
    Hsu, Ching-Hsien
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (02) : 2597 - 2597
  • [30] Natural Language Processing and Electronic Medical Records Reply
    Murff, Harvey J.
    FitzHenry, Fern
    Speroff, Theodore
    JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2011, 306 (21): : 2325 - 2326