A data-driven architecture using natural language processing to improve phenotyping efficiency and accelerate genetic diagnoses of rare disorders

被引:1
|
作者
Parikh, Jignesh R. [1 ]
Genetti, Casie A. [2 ]
Aykanat, Asli [2 ]
Brownstein, Catherine A. [2 ]
Schmitz-Abe, Klaus [2 ]
Danowski, Morgan [2 ]
Quitadomo, Andrew [2 ,4 ]
Madden, Jill A. [2 ]
Yacoubian, Calum [5 ]
Gain, Richard [5 ]
Williams, Tessa [5 ]
Meskell, Mary [5 ]
Brown, Andrew [5 ]
Frith, Alison [5 ]
Rockowitz, Shira [2 ,4 ]
Sliz, Piotr [2 ,4 ]
Agrawal, Pankaj B. [2 ,6 ]
Defay, Thomas [3 ]
McDonagh, Paul [3 ,7 ]
Reynders, John [3 ,8 ]
Lefebvre, Sebastien [3 ]
Beggs, Alan H. [2 ]
机构
[1] J Sq Labs LLC, Natick, MA 01760 USA
[2] Harvard Med Sch, Manton Ctr Orphan Dis Res, Boston Childrens Hosp, Div Genet & Genom, Boston, MA 02115 USA
[3] Alex Pharmaceut Inc, Boston, MA 02210 USA
[4] Harvard Med Sch, Boston Childrens Hosp, Computat Hlth Informat Program, Boston, MA 02115 USA
[5] Clinithink Ltd, London N1 6DR, England
[6] Harvard Med Sch, Boston Childrens Hosp, Div Newborn Med, Boston, MA 02115 USA
[7] Sema4, Stamford, CT 06902 USA
[8] Latent Strategies LLC, Newton, MA 02465 USA
来源
基金
美国国家卫生研究院;
关键词
REANALYSIS;
D O I
10.1016/j.xhgg.2021.100035
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Effective genetic diagnosis requires the correlation of genetic variant data with detailed phenotypic information. However, manual encoding of clinical data into machine-readable forms is laborious and subject to observer bias. Natural language processing (NLP) of electronic health records has great potential to enhance reproducibility at scale but suffers from idiosyncrasies in physician notes and other medical records. We developed methods to optimize NLP outputs for automated diagnosis. We filtered NLP-extracted Human Phenotype Ontology (HPO) terms to more closely resemble manually extracted terms and identified filter parameters across a three-dimensional space for optimal gene prioritization. We then developed a tiered pipeline that reduces manual effort by prioritizing smaller subsets of genes to consider for genetic diagnosis. Our filtering pipeline enabled NLP-based extraction of HPO terms to serve as a sufficient replacement for manual extraction in 92% of prospectively evaluated cases. In 75% of cases, the correct causal gene was ranked higher with our applied filters than without any filters. We describe a framework that can maximize the utility of NLP-based phenotype extraction for gene prioritization and diagnosis. The framework is implemented within a cloud-based modular architecture that can be deployed across health and research institutions.
引用
收藏
页数:10
相关论文
共 26 条
  • [21] Data driven knowledge summarization of friction stir welded magnesium alloys literature by using natural language processing algorithms
    Akshansh Mishra
    International Journal on Interactive Design and Manufacturing (IJIDeM), 2024, 18 : 1113 - 1119
  • [22] Data driven knowledge summarization of friction stir welded magnesium alloys literature by using natural language processing algorithms
    Mishra, Akshansh
    INTERNATIONAL JOURNAL OF INTERACTIVE DESIGN AND MANUFACTURING - IJIDEM, 2024, 18 (03): : 1113 - 1119
  • [23] Deciphering the Diversity of Mental Models in Neurodevelopmental Disorders: Knowledge Graph Representation of Public Data Using Natural Language Processing
    Kaur, Manpreet
    Costello, Jeremy
    Willis, Elyse
    Kelm, Karen
    Reformat, Marek Z.
    Bolduc, Francois, V
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2022, 24 (08)
  • [24] Improving the Efficiency of Clinical Trial Recruitment Using Electronic Health Record Data, Natural Language Processing, and Machine Learning
    Cai, Tianrun
    Cai, Fiona
    Dahal, Kumar
    Hong, Chuan
    Liao, Katherine
    ARTHRITIS & RHEUMATOLOGY, 2019, 71
  • [25] Validation of a natural language processing algorithm using national reporting data to improve identification of anesthesia-related ADVerse evENTs: The "ADVENTURE" study
    Mertes, Paul M.
    Morgand, Claire
    Barach, Paul
    Jurkolow, Geoffrey
    Assmann, Karen E.
    Dufetelle, Edouard
    Susplugas, Vincent
    Alauddin, Bilal
    Yavordios, Patrick Georges
    Tourres, Jean
    Dumeix, Jean -Marc
    Capdevila, Xavier
    ANAESTHESIA CRITICAL CARE & PAIN MEDICINE, 2024, 43 (04)
  • [26] Using natural language processing to extract self-harm and suicidality data from a clinical sample of patients with eating disorders: a retrospective cohort study
    Cliffe, Charlotte
    Seyedsalehi, Aida
    Vardavoulia, Katerina
    Bittar, Andre
    Velupillai, Sumithra
    Shetty, Hitesh
    Schmidt, Ulrike
    Dutta, Rina
    BMJ OPEN, 2021, 11 (12):