ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis

被引：0

作者：

Gan, Ziming ^{[1
]}

Zhou, Doudou ^{[2
]}

Rush, Everett ^{[3
]}

Panickan, Vidul A. ^{[4
,5
]}

Hoe, Yuk-Lam ^{[5
]}

Ostrouchovm, George ^{[3
]}

Xu, Zhiwei ^{[6
]}

Shen, Shuting ^{[7
]}

Xiong, Xin ^{[8
]}

Greco, Kimberly F. ^{[8
]}

Hong, Chuan ^{[7
]}

Bonzel, Clara-Lea ^{[4
]}

Wend, Jun ^{[4
]}

Costa, Lauren ^{[5
]}

Cai, Tianrun ^{[5
,9
]}

Begoli, Edmon

Xiaj, Zongqi ^{[10
]}

Gaziano, J. Michael ^{[5
,9
]}

Liao, Katherine P. ^{[5
,9
]}

Cho, Kelly ^{[5
,9
]}

Cai, Tianxi ^{[4
,5
,8
]}

Lu, Junwei ^{[5
,8
]}

机构：

[1] Univ Chicago, Dept Stat, 5801 S Ellis Ave, Chicago, IL 60615 USA

[2] Natl Univ Singapore, Dept Stat & Data Sci, Singapore 117546, Singapore

[3] Oak Ridge Natl Lab, Bethel Valley Rd, Oak Ridge, TN 37830 USA

[4] Harvard Med Sch, 25 Shattuck St, Boston, MA 02115 USA

[5] VA Boston Healthcare Syst, 150 S Huntington Ave, Boston, MA 02130 USA

[6] Univ Michigan, Dept Stat, 500 S State St, Ann Arbor, MI 48109 USA

[7] Duke Univ, Dept Biostat & Bioinformat, 1121 West Main St, Durham, NC 27708 USA

[8] Harvard TH Chan Sch Publ Hlth, 677 Huntington Ave, Boston, MA 02115 USA

[9] Brigham & Womens Hosp, 60 Fenwood Rd, Boston, MA 02115 USA

[10] Univ Pittsburgh, Clin & Translat Sci, 3501 Fifth Ave, Pittsburgh, PA 15260 USA

来源：

JOURNAL OF BIOMEDICAL INFORMATICS | 2025年 / 162卷

关键词：

Electronic health records; Natural language processing; Representation learning; Knowledge graph; ALZHEIMER-DISEASE; IDENTIFY; MODERATE; RISK;

D O I：

10.1016/j.jbi.2024.104761

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Objective: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes (NLP). The complexity of EHR presents challenges in feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient Aggregated naRrative Codified Health (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features. Methods: Using data from 12.5 million Veterans Affairs patients, ARCH first derives embedding vectors and generates similarities along with associated p-values to measure the strength of relatedness between clinical features with statistical certainty quantification. Next, ARCH performs a sparse embedding regression to remove indirect linkage between features to build a sparse KG. Finally, ARCH was validated on various clinical tasks, including detecting known relationships between entity pairs, predicting drug side effects, disease phenotyping, as well as sub-typing Alzheimer's disease patients. Results: ARCH produces high-quality clinical embeddings and KG for over 60,000 codified and narrative EHR concepts. The KG and embeddings are visualized in the R-shiny powered web-API.3 ARCH achieved high accuracy in detecting EHR concept relationships, with AUCs of 0.926 (codified) and 0.861 (NLP) for similar EHR concepts, and 0.810 (codified) and 0.843 (NLP) for related pairs. It detected drug side effects with a 0.723 AUC, which improved to 0.826 after fine-tuning. Using both codified and NLP features, the detection power increased significantly. Compared to other methods, ARCH has superior accuracy and enhances weakly supervised phenotyping algorithms' performance. Notably, it successfully categorized Alzheimer's patients into two subgroups with varying mortality rates. Conclusion: The proposed ARCH algorithm generates large-scale high-quality semantic representations and knowledge graph for both codified and NLP EHR features, useful for a wide range of predictive modeling tasks.

引用

页数：11

共 50 条

[31] Semantic guided knowledge graph for large-scale zero-shot learning
Wei, Jiwei
Sun, Haotian
Yang, Yang
Xu, Xing
Li, Jingjing
Shen, Heng Tao
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 88
[32] A large-scale mobile application knowledge graph for the research of cybersecurity: Construction and application
Li, Weizhuo
Zhou, Heng
Tan, Yiming
Luo, Weiqi
Ji, Qiu
Bian, Yuyang
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 149
[33] Retrieval-Enhanced Generative Model for Large-Scale Knowledge Graph Completion
Yu, Donghan
Yang, Yiming
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2334 - 2338
[34] ASSESSING THE RELATIONSHIP BETWEEN PTSD AND TYPE 2 DIABETES IN A LARGE-SCALE ANALYSIS OF VETERAN HEALTH RECORDS
Liang, Katharine
Schindler, Abigail
Hendrickson, Rebecca
NEUROPSYCHOPHARMACOLOGY, 2024, 49 : 472 - 473
[35] Large-Scale Hierarchical Causal Discovery via Weak Prior Knowledge
Wang, Xiangyu
Ban, Taiyu
Chen, Lyuzhou
Lyu, Derui
Zhu, Qinrui
Chen, Huanhuan
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (05) : 2695 - 2711
[36] The Transmission of Knowledge via Large-Scale Technology: A Shared Agency Account
Greco, John
SOCIAL EPISTEMOLOGY, 2025,
[37] BAYESIAN COX REGRESSION FOR LARGE-SCALE INFERENCE WITH APPLICATIONS TO ELECTRONIC HEALTH RECORDS
Jung, Alexander Wolfgang
Gerstung, Moritz
ANNALS OF APPLIED STATISTICS, 2023, 17 (02): : 1064 - 1085
[38] Mining large-scale news video database via knowledge visualization
Luo, Hangzai
Fan, Jianping
Satoh, Shin'ichi
Xue, Xiangyang
ADVANCES IN VISUAL INFORMATION SYSTEMS, 2007, 4781 : 254 - +
[39] Large-Scale Multi-View Spectral Clustering via Bipartite Graph
Li, Yeqing
Nie, Feiping
Huang, Heng
Huang, Junzhou
PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2750 - 2756
[40] Generative adversarial meta-learning knowledge graph completion for large-scale complex knowledge graphs
Tong, Weiming
Chu, Xu
Li, Zhongwei
Tan, Liguo
Zhao, Jinxiao
Pan, Feng
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, : 1685 - 1701

← 1 2 3 4 5 →