Enhancing patient representation learning with inferred family pedigrees improves disease risk prediction

被引:0
|
作者
Huang, Xiayuan [1 ]
Arora, Jatin [2 ]
Erzurumluoglu, Abdullah Mesut [2 ]
Stanhope, Stephen A. [3 ]
Lam, Daniel [4 ]
Arora, Jatin [2 ]
Erzurumluoglu, Abdullah Mesut [2 ]
Lam, Daniel [4 ]
Khoueiry, Pierre
Jensen, Jan N.
Cai, James
Lawless, Nathan
Kriegl, Jan
Ding, Zhihao
de Jong, Johann [6 ,7 ]
Zhao, Hongyu [1 ]
Ding, Zhihao
Wang, Zuoheng [1 ,2 ,5 ]
de Jong, Johann [6 ,7 ]
机构
[1] Yale Univ, Sch Publ Hlth, Dept Biostat, New Haven, CT 06510 USA
[2] Boehringer Ingelheim Pharm GmbH & Co KG, Global Computat Biol & Digital Sci, Human Genet, D-88400 Biberach, Germany
[3] Boehringer Ingelheim GmbH & Co KG, Real World Data & Analyt, Global Med Affairs, Ridgefield, CT 06877 USA
[4] Boehringer Ingelheim Pharm GmbH & Co KG, CB CMDR, Global Computat Biol & Digital Sci, D-88400 Biberach, Germany
[5] Yale Univ, Sch Med, Dept Biomed Informat & Data Sci, New Haven, CT 06510 USA
[6] Boehringer Ingelheim Pharm GmbH & Co KG, Global Computat Biol & Digital Sci, Stat Modeling, D-88400 Biberach, Germany
[7] UCB Biosci GmbH, Adv Analyt Patient Solut, D-40789 Monheim, Germany
关键词
electronic health records; patient modeling; disease risk prediction; graph attention networks; ULCERATIVE-COLITIS; HERITABILITY; HISTORY; RECORD;
D O I
10.1093/jamia/ocae297
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Background Machine learning and deep learning are powerful tools for analyzing electronic health records (EHRs) in healthcare research. Although family health history has been recognized as a major predictor for a wide spectrum of diseases, research has so far adopted a limited view of family relations, essentially treating patients as independent samples in the analysis.Methods To address this gap, we present ALIGATEHR, which models inferred family relations in a graph attention network augmented with an attention-based medical ontology representation, thus accounting for the complex influence of genetics, shared environmental exposures, and disease dependencies.Results Taking disease risk prediction as a use case, we demonstrate that explicitly modeling family relations significantly improves predictions across the disease spectrum. We then show how ALIGATEHR's attention mechanism, which links patients' disease risk to their relatives' clinical profiles, successfully captures genetic aspects of diseases using longitudinal EHR diagnosis data. Finally, we use ALIGATEHR to successfully distinguish the 2 main inflammatory bowel disease subtypes with highly shared risk factors and symptoms (Crohn's disease and ulcerative colitis).Conclusion Overall, our results highlight that family relations should not be overlooked in EHR research and illustrate ALIGATEHR's great potential for enhancing patient representation learning for predictive and interpretable modeling of EHRs.
引用
收藏
页码:435 / 446
页数:12
相关论文
共 50 条
  • [41] Enhancing the prediction of disease-gene associations with multimodal deep learning
    Luo, Ping
    Li, Yuanyuan
    Tian, Li-Ping
    Wu, Fang-Xiang
    BIOINFORMATICS, 2019, 35 (19) : 3735 - 3742
  • [42] Knowledge Representation for Infectious Disease Risk Prediction System: A Literature Review
    Vinarti, Retno Aulia
    FIFTH INFORMATION SYSTEMS INTERNATIONAL CONFERENCE, 2019, 161 : 821 - 825
  • [43] A genetic risk score combining ten psoriasis risk loci improves disease prediction
    Chen, H.
    Poon, A.
    Yeung, C.
    Helms, C.
    Pons, J.
    Bowcock, A.
    Kwok, P.
    Liao, W.
    JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2011, 131 : S71 - S71
  • [44] A Genetic Risk Score Combining Ten Psoriasis Risk Loci Improves Disease Prediction
    Chen, Haoyan
    Poon, Annie
    Yeung, Celestine
    Helms, Cynthia
    Pons, Jennifer
    Bowcock, Anne M.
    Kwok, Pui-Yan
    Liao, Wilson
    PLOS ONE, 2011, 6 (04):
  • [45] Inclusion of remnant cholesterol improves risk prediction for ischaemic heart disease
    Lim, Gregory B.
    NATURE REVIEWS CARDIOLOGY, 2022, 19 (08) : 504 - 504
  • [46] Inclusion of remnant cholesterol improves risk prediction for ischaemic heart disease
    Gregory B. Lim
    Nature Reviews Cardiology, 2022, 19 : 504 - 504
  • [47] Machine learning for mortality risk prediction with changing patient demographics
    Wainwright, Richard
    Shenfield, Alex
    2023 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, CIBCB, 2023, : 116 - 122
  • [48] Knowledge-aware patient representation learning for multiple disease subtypes
    Lu, Menglin
    Zhang, Yujie
    Zhang, Suixia
    Shi, Hanrui
    Huang, Zhengxing
    JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 138
  • [49] Unsupervised representation learning on high-dimensional clinical data improves genomic discovery and prediction
    Yun, Taedong
    Cosentino, Justin
    Behsaz, Babak
    McCaw, Zachary R.
    Hill, Davin
    Luben, Robert
    Lai, Dongbing
    Bates, John
    Yang, Howard
    Schwantes-An, Tae-Hwi
    Zhou, Yuchen
    Khawaja, Anthony P.
    Carroll, Andrew
    Hobbs, Brian D.
    Cho, Michael H.
    McLean, Cory Y.
    Hormozdiari, Farhad
    NATURE GENETICS, 2024, 56 (08) : 1604 - 1613
  • [50] Family-based genetic risk prediction of multifactorial disease
    Douglas M Ruderfer
    Joshua Korn
    Shaun M Purcell
    Genome Medicine, 2