A data-driven architecture using natural language processing to improve phenotyping efficiency and accelerate genetic diagnoses of rare disorders

被引:1
|
作者
Parikh, Jignesh R. [1 ]
Genetti, Casie A. [2 ]
Aykanat, Asli [2 ]
Brownstein, Catherine A. [2 ]
Schmitz-Abe, Klaus [2 ]
Danowski, Morgan [2 ]
Quitadomo, Andrew [2 ,4 ]
Madden, Jill A. [2 ]
Yacoubian, Calum [5 ]
Gain, Richard [5 ]
Williams, Tessa [5 ]
Meskell, Mary [5 ]
Brown, Andrew [5 ]
Frith, Alison [5 ]
Rockowitz, Shira [2 ,4 ]
Sliz, Piotr [2 ,4 ]
Agrawal, Pankaj B. [2 ,6 ]
Defay, Thomas [3 ]
McDonagh, Paul [3 ,7 ]
Reynders, John [3 ,8 ]
Lefebvre, Sebastien [3 ]
Beggs, Alan H. [2 ]
机构
[1] J Sq Labs LLC, Natick, MA 01760 USA
[2] Harvard Med Sch, Manton Ctr Orphan Dis Res, Boston Childrens Hosp, Div Genet & Genom, Boston, MA 02115 USA
[3] Alex Pharmaceut Inc, Boston, MA 02210 USA
[4] Harvard Med Sch, Boston Childrens Hosp, Computat Hlth Informat Program, Boston, MA 02115 USA
[5] Clinithink Ltd, London N1 6DR, England
[6] Harvard Med Sch, Boston Childrens Hosp, Div Newborn Med, Boston, MA 02115 USA
[7] Sema4, Stamford, CT 06902 USA
[8] Latent Strategies LLC, Newton, MA 02465 USA
来源
基金
美国国家卫生研究院;
关键词
REANALYSIS;
D O I
10.1016/j.xhgg.2021.100035
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Effective genetic diagnosis requires the correlation of genetic variant data with detailed phenotypic information. However, manual encoding of clinical data into machine-readable forms is laborious and subject to observer bias. Natural language processing (NLP) of electronic health records has great potential to enhance reproducibility at scale but suffers from idiosyncrasies in physician notes and other medical records. We developed methods to optimize NLP outputs for automated diagnosis. We filtered NLP-extracted Human Phenotype Ontology (HPO) terms to more closely resemble manually extracted terms and identified filter parameters across a three-dimensional space for optimal gene prioritization. We then developed a tiered pipeline that reduces manual effort by prioritizing smaller subsets of genes to consider for genetic diagnosis. Our filtering pipeline enabled NLP-based extraction of HPO terms to serve as a sufficient replacement for manual extraction in 92% of prospectively evaluated cases. In 75% of cases, the correct causal gene was ranked higher with our applied filters than without any filters. We describe a framework that can maximize the utility of NLP-based phenotype extraction for gene prioritization and diagnosis. The framework is implemented within a cloud-based modular architecture that can be deployed across health and research institutions.
引用
收藏
页数:10
相关论文
共 26 条
  • [1] Towards data-driven medical imaging using natural language processing in patients with suspected urolithiasis
    Jungmann, Florian
    Kaempgen, Benedikt
    Mildenberger, Philipp
    Tsaur, Igor
    Jorg, Tobias
    Dueber, Christoph
    Mildenberger, Peter
    Kloeckner, Roman
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2020, 137
  • [2] Data-driven materials research enabled by natural language processing and information extraction
    Olivetti, Elsa A.
    Cole, Jacqueline M.
    Kim, Edward
    Kononova, Olga
    Ceder, Gerbrand
    Han, Thomas Yong-Jin
    Hiszpanski, Anna M.
    [J]. APPLIED PHYSICS REVIEWS, 2020, 7 (04)
  • [3] Data-driven automatic classification model for construction accident cases using natural language processing with hyperparameter tuning
    Kumi, Louis
    Jeong, Jaewook
    Jeong, Jaemin
    [J]. AUTOMATION IN CONSTRUCTION, 2024, 164
  • [4] From data to insights: how natural language processing and structured reporting advance data-driven radiology
    Fink, Matthias A.
    [J]. EUROPEAN RADIOLOGY, 2023, 33 (11) : 7494 - 7495
  • [5] From data to insights: how natural language processing and structured reporting advance data-driven radiology
    Matthias A. Fink
    [J]. European Radiology, 2023, 33 : 7494 - 7495
  • [6] Natural language spoken interface control using data-driven semantic inference
    Bellegarda, JR
    Silverman, KEA
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (03): : 267 - 277
  • [7] Using Natural Language Processing to Accelerate Deep Analysis of Open-Ended Survey Data
    Zhang, Tianyi
    Moody, Monica
    Nelon, Julia P.
    Boyer, D. Matthew
    Smith, D. Hudson
    Visser, Ryan D.
    [J]. 2019 IEEE SOUTHEASTCON, 2019,
  • [8] Using Natural Language Processing to Improve Discrete Data Capture From Interpretive Cervical Biopsy Diagnoses at a Large Health Care Organization
    Wi, Soora
    Goldhoff, Patricia E.
    Fuller, Laurie A.
    Grewal, Kiranjit
    Wentzensen, Nicolas
    Clarke, Megan A.
    Lorey, Thomas S.
    [J]. ARCHIVES OF PATHOLOGY & LABORATORY MEDICINE, 2023, 147 (02) : 222 - 226
  • [9] Towards Data-driven Ontologies: a Filtering Approach using Keywords and Natural Language Constructs
    de Boer, Maaike H. T.
    Verhoosel, Jack P. C.
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2285 - 2292
  • [10] Accelerating Materials Discovery for Polymer Solar Cells: Data-Driven Insights Enabled by Natural Language Processing
    Shetty, Pranav
    Adeboye, Aishat
    Gupta, Sonakshi
    Zhang, Chao
    Ramprasad, Rampi
    [J]. CHEMISTRY OF MATERIALS, 2024, 36 (16) : 7676 - 7689