Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes

被引:97
|
作者
Himmelstein, Daniel S. [1 ]
Baranzini, Sergio E. [1 ,2 ,3 ]
机构
[1] Univ Calif San Francisco, Biol & Med Informat, San Francisco, CA 94143 USA
[2] Univ Calif San Francisco, Dept Neurol, San Francisco, CA USA
[3] Univ Calif San Francisco, Inst Human Genet, San Francisco, CA 94143 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
GENOME-WIDE ASSOCIATION; RANDOM-WALK; C-REL; EXPRESSION; ONTOLOGY; REGULARIZATION; CELLS; TOOL; SET; MAP;
D O I
10.1371/journal.pcbi.1004259
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The first decade of Genome Wide Association Studies (GWAS) has uncovered a wealth of disease-associated variants. Two important derivations will be the translation of this information into a multiscale understanding of pathogenic variants and leveraging existing data to increase the power of existing and future studies through prioritization. We explore edge prediction on heterogeneous networks-graphs with multiple node and edge types-for accomplishing both tasks. First we constructed a network with 18 node types-genes, diseases, tissues, pathophysiologies, and 14 MSigDB (molecular signatures database) collections-and 19 edge types from high-throughput publicly-available resources. From this network composed of 40,343 nodes and 1,608,168 edges, we extracted features that describe the topology between specific genes and diseases. Next, we trained a model from GWAS associations and predicted the probability of association between each protein-coding gene and each of 29 well-studied complex diseases. The model, which achieved 132-fold enrichment in precision at 10% recall, outperformed any individual domain, highlighting the benefit of integrative approaches. We identified pleiotropy, transcriptional signatures of perturbations, pathways, and protein interactions as influential mechanisms explaining pathogenesis. Our method successfully predicted the results (with AUROC = 0.79) from a withheld multiple sclerosis (MS) GWAS despite starting with only 13 previously associated genes. Finally, we combined our network predictions with statistical evidence of association to propose four novel MS genes, three of which (JAK2, REL, RUNX3) validated on the masked GWAS. Furthermore, our predictions provide biological support highlighting REL as the causal gene within its gene-rich locus. Users can browse all predictions online (http://het.io). Heterogeneous network edge prediction effectively prioritized genetic associations and provides a powerful new approach for data integration across multiple domains.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Prediction of Alzheimer's Disease-Associated Genes by Integration of GWAS Summary Data and Expression Data
    Hao, Sicheng
    Wang, Rui
    Zhang, Yu
    Zhan, Hui
    [J]. FRONTIERS IN GENETICS, 2019, 9
  • [2] TopControl: A Tool to Prioritize Candidate Disease-associated Genes based on Topological Network Features
    Nazarieh, Maryam
    Helms, Volkhard
    [J]. SCIENTIFIC REPORTS, 2019, 9 (1)
  • [3] TopControl: A Tool to Prioritize Candidate Disease-associated Genes based on Topological Network Features
    Maryam Nazarieh
    Volkhard Helms
    [J]. Scientific Reports, 9
  • [4] Integrating embeddings of multiple gene networks to prioritize complex disease-associated genes
    Wu, Mengmeng
    Zeng, Wanwen
    Liu, Wenqiang
    Zhang, Yijia
    Chen, Ting
    Jiang, Rui
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 208 - 215
  • [5] Integration of Expression and Other Genomic Data to Further Study and Prioritize Genes Associated With Inflammatory Bowel Disease
    Ning, Kaida
    Gettler, Kyle
    Hui, Ken
    Zhang, Wei
    Ferguson, John P.
    Cho, Judy H.
    [J]. GASTROENTEROLOGY, 2013, 144 (05) : S468 - S468
  • [6] Identification of disease-associated loci using machine learning for genotype and network data integration
    Leal, Luis G.
    David, Alessia
    Jarvelin, Marjo-Riita
    Sebert, Sylvain
    Mannikko, Minna
    Karhunen, Ville
    Seaby, Eleanor
    Hoggart, Clive
    Sternberg, Michael J. E.
    [J]. BIOINFORMATICS, 2019, 35 (24) : 5182 - 5190
  • [7] Revealing disease-associated pathways by network integration of untargeted metabolomics
    Pirhaji L.
    Milani P.
    Leidl M.
    Curran T.
    Avila-Pacheco J.
    Clish C.B.
    White F.M.
    Saghatelian A.
    Fraenkel E.
    [J]. Nature Methods, 2016, 13 (9) : 770 - 776
  • [8] Revealing disease-associated pathways by network integration of untargeted metabolomics
    Pirhaji, Leila
    Milani, Pamela
    Leidl, Mathias
    Curran, Timothy
    Avila-Pacheco, Julian
    Clish, Clary B.
    White, Forest M.
    Saghatelian, Alan
    Fraenkel, Ernest
    [J]. NATURE METHODS, 2016, 13 (09) : 770 - 776
  • [9] Inferring Disease-Associated Microbes Based on Multi-Data Integration and Network Consistency Projection
    Fan, Yongxian
    Chen, Meijun
    Zhu, Qingqi
    Wang, Wanru
    [J]. FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2020, 8
  • [10] Citation Recommendation as Edge Prediction in Heterogeneous Bibliographic Network: A Network Representation Approach
    Yang, Libin
    Zhang, Zeqing
    Cai, Xiaoyan
    Guo, Lantian
    [J]. IEEE ACCESS, 2019, 7 : 23232 - 23239