ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis

被引：0

作者：

Gan, Ziming ^{[1
]}

Zhou, Doudou ^{[2
]}

Rush, Everett ^{[3
]}

Panickan, Vidul A. ^{[4
,5
]}

Hoe, Yuk-Lam ^{[5
]}

Ostrouchovm, George ^{[3
]}

Xu, Zhiwei ^{[6
]}

Shen, Shuting ^{[7
]}

Xiong, Xin ^{[8
]}

Greco, Kimberly F. ^{[8
]}

Hong, Chuan ^{[7
]}

Bonzel, Clara-Lea ^{[4
]}

Wend, Jun ^{[4
]}

Costa, Lauren ^{[5
]}

Cai, Tianrun ^{[5
,9
]}

Begoli, Edmon

Xiaj, Zongqi ^{[10
]}

Gaziano, J. Michael ^{[5
,9
]}

Liao, Katherine P. ^{[5
,9
]}

Cho, Kelly ^{[5
,9
]}

Cai, Tianxi ^{[4
,5
,8
]}

Lu, Junwei ^{[5
,8
]}

机构：

[1] Univ Chicago, Dept Stat, 5801 S Ellis Ave, Chicago, IL 60615 USA

[2] Natl Univ Singapore, Dept Stat & Data Sci, Singapore 117546, Singapore

[3] Oak Ridge Natl Lab, Bethel Valley Rd, Oak Ridge, TN 37830 USA

[4] Harvard Med Sch, 25 Shattuck St, Boston, MA 02115 USA

[5] VA Boston Healthcare Syst, 150 S Huntington Ave, Boston, MA 02130 USA

[6] Univ Michigan, Dept Stat, 500 S State St, Ann Arbor, MI 48109 USA

[7] Duke Univ, Dept Biostat & Bioinformat, 1121 West Main St, Durham, NC 27708 USA

[8] Harvard TH Chan Sch Publ Hlth, 677 Huntington Ave, Boston, MA 02115 USA

[9] Brigham & Womens Hosp, 60 Fenwood Rd, Boston, MA 02115 USA

[10] Univ Pittsburgh, Clin & Translat Sci, 3501 Fifth Ave, Pittsburgh, PA 15260 USA

来源：

JOURNAL OF BIOMEDICAL INFORMATICS | 2025年 / 162卷

关键词：

Electronic health records; Natural language processing; Representation learning; Knowledge graph; ALZHEIMER-DISEASE; IDENTIFY; MODERATE; RISK;

D O I：

10.1016/j.jbi.2024.104761

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Objective: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes (NLP). The complexity of EHR presents challenges in feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient Aggregated naRrative Codified Health (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features. Methods: Using data from 12.5 million Veterans Affairs patients, ARCH first derives embedding vectors and generates similarities along with associated p-values to measure the strength of relatedness between clinical features with statistical certainty quantification. Next, ARCH performs a sparse embedding regression to remove indirect linkage between features to build a sparse KG. Finally, ARCH was validated on various clinical tasks, including detecting known relationships between entity pairs, predicting drug side effects, disease phenotyping, as well as sub-typing Alzheimer's disease patients. Results: ARCH produces high-quality clinical embeddings and KG for over 60,000 codified and narrative EHR concepts. The KG and embeddings are visualized in the R-shiny powered web-API.3 ARCH achieved high accuracy in detecting EHR concept relationships, with AUCs of 0.926 (codified) and 0.861 (NLP) for similar EHR concepts, and 0.810 (codified) and 0.843 (NLP) for related pairs. It detected drug side effects with a 0.723 AUC, which improved to 0.826 after fine-tuning. Using both codified and NLP features, the detection power increased significantly. Compared to other methods, ARCH has superior accuracy and enhances weakly supervised phenotyping algorithms' performance. Notably, it successfully categorized Alzheimer's patients into two subgroups with varying mortality rates. Conclusion: The proposed ARCH algorithm generates large-scale high-quality semantic representations and knowledge graph for both codified and NLP EHR features, useful for a wide range of predictive modeling tasks.

引用

页数：11

共 50 条

[41] Narrative In Situ Visual Analysis for Large-Scale Ocean Eddy Evolution
Han, Xiaoyang
Yu, Xiaomin
Li, Guan
Liu, Jun
Zhao, Ying
Shan, Guihua
IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2022, 42 (03) : 65 - 73
[42] Stability Analysis of the Arch Crown of a Large-Scale Underground Powerhouse During Excavation
Zhonghua Hu
Nuwen Xu
Biao Li
Ying Xu
Jian Xu
Kai Wang
Rock Mechanics and Rock Engineering, 2020, 53 : 2935 - 2943
[43] Stability Analysis of the Arch Crown of a Large-Scale Underground Powerhouse During Excavation
Hu, Zhonghua
Xu, Nuwen
Li, Biao
Xu, Ying
Xu, Jian
Wang, Kai
ROCK MECHANICS AND ROCK ENGINEERING, 2020, 53 (06) : 2935 - 2943
[44] Start Small, Think Big: On Hyperparameter Optimization for Large-Scale Knowledge Graph Embeddings
Kochsiek, Adrian
Niesel, Fritz
Gemulla, Rainer
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT II, 2023, 13714 : 138 - 154
[45] Petagraph: A large-scale unifying knowledge graph framework for integrating biomolecular and biomedical data
Stear, Benjamin J.
Ahooyi, Taha Mohseni
Simmons, J. Alan
Kollar, Charles
Hartman, Lance
Beigel, Katherine
Lahiri, Aditya
Vasisht, Shubha
Callahan, Tiffany J.
Nemarich, Christopher M.
Silverstein, Jonathan C.
Taylor, Deanne M.
SCIENTIFIC DATA, 2024, 11 (01)
[46] CICHMKG: a large-scale and comprehensive Chinese intangible cultural heritage multimodal knowledge graph
Tao Fan
Hao Wang
Tobias Hodel
Heritage Science, 11
[47] Anytime bottom-up rule learning for large-scale knowledge graph completion
Christian Meilicke
Melisachew Wudage Chekol
Patrick Betz
Manuel Fink
Heiner Stuckeschmidt
The VLDB Journal, 2024, 33 : 131 - 161
[48] CICHMKG: a large-scale and comprehensive Chinese intangible cultural heritage multimodal knowledge graph
Fan, Tao
Wang, Hao
Hodel, Tobias
HERITAGE SCIENCE, 2023, 11 (01)
[49] Anytime bottom-up rule learning for large-scale knowledge graph completion
Meilicke, Christian
Chekol, Melisachew Wudage
Betz, Patrick
Fink, Manuel
Stuckeschmidt, Heiner
VLDB JOURNAL, 2024, 33 (01): : 131 - 161
[50] Con2KG-A Large-scale Domain-Specific Knowledge Graph
Goyal, Nidhi
Sachdeva, Niharika
Choudhary, Vijay
Kar, Rijula
Kumaraguru, Ponnurangam
Rajput, Nitendra
PROCEEDINGS OF THE 30TH ACM CONFERENCE ON HYPERTEXT AND SOCIAL MEDIA (HT '19), 2019, : 287 - 288

← 1 2 3 4 5 →