ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis

被引:0
|
作者
Gan, Ziming [1 ]
Zhou, Doudou [2 ]
Rush, Everett [3 ]
Panickan, Vidul A. [4 ,5 ]
Hoe, Yuk-Lam [5 ]
Ostrouchovm, George [3 ]
Xu, Zhiwei [6 ]
Shen, Shuting [7 ]
Xiong, Xin [8 ]
Greco, Kimberly F. [8 ]
Hong, Chuan [7 ]
Bonzel, Clara-Lea [4 ]
Wend, Jun [4 ]
Costa, Lauren [5 ]
Cai, Tianrun [5 ,9 ]
Begoli, Edmon
Xiaj, Zongqi [10 ]
Gaziano, J. Michael [5 ,9 ]
Liao, Katherine P. [5 ,9 ]
Cho, Kelly [5 ,9 ]
Cai, Tianxi [4 ,5 ,8 ]
Lu, Junwei [5 ,8 ]
机构
[1] Univ Chicago, Dept Stat, 5801 S Ellis Ave, Chicago, IL 60615 USA
[2] Natl Univ Singapore, Dept Stat & Data Sci, Singapore 117546, Singapore
[3] Oak Ridge Natl Lab, Bethel Valley Rd, Oak Ridge, TN 37830 USA
[4] Harvard Med Sch, 25 Shattuck St, Boston, MA 02115 USA
[5] VA Boston Healthcare Syst, 150 S Huntington Ave, Boston, MA 02130 USA
[6] Univ Michigan, Dept Stat, 500 S State St, Ann Arbor, MI 48109 USA
[7] Duke Univ, Dept Biostat & Bioinformat, 1121 West Main St, Durham, NC 27708 USA
[8] Harvard TH Chan Sch Publ Hlth, 677 Huntington Ave, Boston, MA 02115 USA
[9] Brigham & Womens Hosp, 60 Fenwood Rd, Boston, MA 02115 USA
[10] Univ Pittsburgh, Clin & Translat Sci, 3501 Fifth Ave, Pittsburgh, PA 15260 USA
关键词
Electronic health records; Natural language processing; Representation learning; Knowledge graph; ALZHEIMER-DISEASE; IDENTIFY; MODERATE; RISK;
D O I
10.1016/j.jbi.2024.104761
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes (NLP). The complexity of EHR presents challenges in feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient Aggregated naRrative Codified Health (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features. Methods: Using data from 12.5 million Veterans Affairs patients, ARCH first derives embedding vectors and generates similarities along with associated p-values to measure the strength of relatedness between clinical features with statistical certainty quantification. Next, ARCH performs a sparse embedding regression to remove indirect linkage between features to build a sparse KG. Finally, ARCH was validated on various clinical tasks, including detecting known relationships between entity pairs, predicting drug side effects, disease phenotyping, as well as sub-typing Alzheimer's disease patients. Results: ARCH produces high-quality clinical embeddings and KG for over 60,000 codified and narrative EHR concepts. The KG and embeddings are visualized in the R-shiny powered web-API.3 ARCH achieved high accuracy in detecting EHR concept relationships, with AUCs of 0.926 (codified) and 0.861 (NLP) for similar EHR concepts, and 0.810 (codified) and 0.843 (NLP) for related pairs. It detected drug side effects with a 0.723 AUC, which improved to 0.826 after fine-tuning. Using both codified and NLP features, the detection power increased significantly. Compared to other methods, ARCH has superior accuracy and enhances weakly supervised phenotyping algorithms' performance. Notably, it successfully categorized Alzheimer's patients into two subgroups with varying mortality rates. Conclusion: The proposed ARCH algorithm generates large-scale high-quality semantic representations and knowledge graph for both codified and NLP EHR features, useful for a wide range of predictive modeling tasks.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Narrative In Situ Visual Analysis for Large-Scale Ocean Eddy Evolution
    Han, Xiaoyang
    Yu, Xiaomin
    Li, Guan
    Liu, Jun
    Zhao, Ying
    Shan, Guihua
    IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2022, 42 (03) : 65 - 73
  • [42] Stability Analysis of the Arch Crown of a Large-Scale Underground Powerhouse During Excavation
    Zhonghua Hu
    Nuwen Xu
    Biao Li
    Ying Xu
    Jian Xu
    Kai Wang
    Rock Mechanics and Rock Engineering, 2020, 53 : 2935 - 2943
  • [43] Stability Analysis of the Arch Crown of a Large-Scale Underground Powerhouse During Excavation
    Hu, Zhonghua
    Xu, Nuwen
    Li, Biao
    Xu, Ying
    Xu, Jian
    Wang, Kai
    ROCK MECHANICS AND ROCK ENGINEERING, 2020, 53 (06) : 2935 - 2943
  • [44] Start Small, Think Big: On Hyperparameter Optimization for Large-Scale Knowledge Graph Embeddings
    Kochsiek, Adrian
    Niesel, Fritz
    Gemulla, Rainer
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT II, 2023, 13714 : 138 - 154
  • [45] Petagraph: A large-scale unifying knowledge graph framework for integrating biomolecular and biomedical data
    Stear, Benjamin J.
    Ahooyi, Taha Mohseni
    Simmons, J. Alan
    Kollar, Charles
    Hartman, Lance
    Beigel, Katherine
    Lahiri, Aditya
    Vasisht, Shubha
    Callahan, Tiffany J.
    Nemarich, Christopher M.
    Silverstein, Jonathan C.
    Taylor, Deanne M.
    SCIENTIFIC DATA, 2024, 11 (01)
  • [46] CICHMKG: a large-scale and comprehensive Chinese intangible cultural heritage multimodal knowledge graph
    Tao Fan
    Hao Wang
    Tobias Hodel
    Heritage Science, 11
  • [47] Anytime bottom-up rule learning for large-scale knowledge graph completion
    Christian Meilicke
    Melisachew Wudage Chekol
    Patrick Betz
    Manuel Fink
    Heiner Stuckeschmidt
    The VLDB Journal, 2024, 33 : 131 - 161
  • [48] CICHMKG: a large-scale and comprehensive Chinese intangible cultural heritage multimodal knowledge graph
    Fan, Tao
    Wang, Hao
    Hodel, Tobias
    HERITAGE SCIENCE, 2023, 11 (01)
  • [49] Anytime bottom-up rule learning for large-scale knowledge graph completion
    Meilicke, Christian
    Chekol, Melisachew Wudage
    Betz, Patrick
    Fink, Manuel
    Stuckeschmidt, Heiner
    VLDB JOURNAL, 2024, 33 (01): : 131 - 161
  • [50] Con2KG-A Large-scale Domain-Specific Knowledge Graph
    Goyal, Nidhi
    Sachdeva, Niharika
    Choudhary, Vijay
    Kar, Rijula
    Kumaraguru, Ponnurangam
    Rajput, Nitendra
    PROCEEDINGS OF THE 30TH ACM CONFERENCE ON HYPERTEXT AND SOCIAL MEDIA (HT '19), 2019, : 287 - 288