Gaussian embedding for large-scale gene set analysis

被引:7
|
作者
Wang, Sheng [1 ]
Flynn, Emily R. [2 ]
Altman, Russ B. [1 ,2 ,3 ]
机构
[1] Stanford Univ, Dept Bioengn, Stanford, CA 94305 USA
[2] Stanford Univ, Biomed Informat Training Program, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
关键词
PROTEIN-INTERACTION NETWORKS; ENRICHMENT ANALYSIS; FUNCTIONAL-ANALYSIS; PATHWAYS; ONTOLOGY; CANCER;
D O I
10.1038/s42256-020-0193-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gene sets, including protein complexes and signalling pathways, have proliferated greatly, in large part as a result of high-throughput biological data. Leveraging gene sets to gain insight into biological discovery requires computational methods for converting them into a useful form for available machine learning models. Here, we study the problem of embedding gene sets as compact features that are compatible with available machine learning codes. We present Set2Gaussian, a novel network-based gene set embedding approach, which represents each gene set as a multivariate Gaussian distribution rather than a single point in the low-dimensional space, according to the proximity of these genes in a protein-protein interaction network. We demonstrate that Set2Gaussian improves gene set member identification, accurately stratifies tumours, and finds concise gene sets for gene set enrichment analysis. We further show how Set2Gaussian allows us to identify a clinical prognostic and predictive subnetwork around neurofilament medium in sarcoma, which we validate in independent cohorts. Gene sets can provide valuable information for gaining insight into disease mechanisms and cellular functions. In this paper, the authors use a Gaussian approach to represent gene sets and gene networks in a low-dimensional space, allowing for accurate prediction and decreased computational complexity.
引用
收藏
页码:387 / 395
页数:9
相关论文
共 50 条
  • [1] Gaussian embedding for large-scale gene set analysis
    Sheng Wang
    Emily R. Flynn
    Russ B. Altman
    [J]. Nature Machine Intelligence, 2020, 2 : 387 - 395
  • [2] Gaussian Embedding of Large-Scale Attributed Graphs
    Hettige, Bhagya
    Li, Yuan-Fang
    Wang, Weiqing
    Buntine, Wray
    [J]. DATABASES THEORY AND APPLICATIONS, ADC 2020, 2020, 12008 : 134 - 146
  • [3] Large-Scale Talent Flow Embedding for Company Competitive Analysis
    Zhang, Le
    Xu, Tong
    Zhu, Hengshu
    Qin, Chuan
    Meng, Qingxin
    Xiong, Hui
    Chen, Enhong
    [J]. WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 2354 - 2364
  • [4] Large-Scale Heterogeneous Feature Embedding
    Huang, Xiao
    Song, Qingquan
    Yang, Fan
    Hu, Xia
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3878 - 3885
  • [5] Analysis of large-scale gene expression data
    Sherlock, G
    [J]. CURRENT OPINION IN IMMUNOLOGY, 2000, 12 (02) : 201 - 205
  • [6] Large-scale analysis of gene clustering in bacteria
    Yang, Qingwu
    Sze, Sing-Hoi
    [J]. GENOME RESEARCH, 2008, 18 (06) : 949 - 956
  • [7] Visualization for Large-scale Gaussian Updates
    Rougier, Jonathan
    Zammit-Mangion, Andrew
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2016, 43 (04) : 1153 - 1161
  • [8] LARGE-SCALE PERIODICITY AND GAUSSIAN FLUCTUATIONS
    DEKEL, A
    BLUMENTHAL, GR
    PRIMACK, JR
    STANHILL, D
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 1992, 257 (04) : 715 - 730
  • [9] Understanding Coarsening for Embedding Large-Scale Graphs
    Akyildiz, Taha Atahan
    Aljundi, Amro Alabsi
    Kaya, Kamer
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2937 - 2946
  • [10] Large-Scale Network Embedding in Apache Spark
    Lin, Wenqing
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 3271 - 3279