A Topological Data Analysis Approach on Predicting Phenotypes from Gene Expression Data

被引:10
|
作者
Mandal, Sayan [1 ]
Guzman-Saenz, Aldo [2 ]
Haiminen, Niina [2 ]
Basu, Saugata [3 ]
Parida, Laxmi [2 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] TJ Watson Res Ctr, IBM Res, Yorktown Hts, NY 10598 USA
[3] Purdue Univ, W Lafayette, IN 47907 USA
关键词
Topological data analysis; Gene expression; Phenotype prediction; Parkinson's disease; PERSISTENT HOMOLOGY;
D O I
10.1007/978-3-030-42266-0_14
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The goal of this study was to investigate if gene expression measured from RNA sequencing contains enough signal to separate healthy and afflicted individuals in the context of phenotype prediction. We observed that standard machine learning methods alone performed somewhat poorly on the disease phenotype prediction task; therefore we devised an approach augmenting machine learning with topological data analysis. We describe a framework for predicting phenotype values by utilizing gene expression data transformed into sample-specific topological signatures by employing feature subsampling and persistent homology. The topological data analysis approach developed in this work yielded improved results on Parkinson's disease phenotype prediction when measured against standard machine learning methods. This study confirms that gene expression can be a useful indicator of the presence or absence of a condition, and the subtle signal contained in this high dimensional data reveals itself when considering the intricate topological connections between expressed genes.
引用
收藏
页码:178 / 187
页数:10
相关论文
共 50 条
  • [21] Predicting qualitative phenotypes from microarray data – the Eadgene pig data set
    Christèle Robert-Granié
    Kim-Anh Lê Cao
    Magali SanCristobal
    [J]. BMC Proceedings, 3 (Suppl 4)
  • [22] Predicting evolutionary targets and parameters of gene deletion from expression data
    dos Santos, Andre Luiz Campelo
    DeGiorgio, Michael
    Assis, Raquel
    [J]. BIOINFORMATICS ADVANCES, 2024, 4 (01):
  • [23] A mixture model approach for the analysis of microarray gene expression data
    Allison, DB
    Gadbury, GL
    Heo, MS
    Fernández, JR
    Lee, CK
    Prolla, TA
    Weindruch, R
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 39 (01) : 1 - 20
  • [24] A mixture model approach for the analysis of microarray gene expression data
    Allison, David B.
    Gadbury, Gary L.
    Heo, Moonseong
    Fernández, José R.
    Lee, Cheol-Koo
    Prolla, Tomas A.
    Weindruch, Richard
    [J]. Computational Statistics and Data Analysis, 2002, 38 (05): : 1 - 20
  • [25] A multivariate analysis approach to the integration of proteomic and gene expression data
    Fagan, Ailis
    Culhane, Aedin C.
    Higgins, Desmond G.
    [J]. PROTEOMICS, 2007, 7 (13) : 2162 - 2171
  • [26] Analysis of gene expression data
    Kneitz, S.
    [J]. CYTOTHERAPY, 2006, 8 : 27 - 27
  • [27] Gene expression data analysis
    Brazma, A
    Vilo, J
    [J]. FEBS LETTERS, 2000, 480 (01) : 17 - 24
  • [28] Gene expression data analysis
    Brazma, A
    Vilo, J
    [J]. MICROBES AND INFECTION, 2001, 3 (10) : 823 - 829
  • [29] A TOPOLOGICAL DATA ANALYSIS APPROACH TO VIDEO SUMMARIZATION
    Hu, Chuan-Shen
    Yeh, Mei-Chen
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1815 - 1819
  • [30] Integrating phenotype and gene expression data for predicting gene function
    Malone, Brandon M.
    Perkins, Andy D.
    Bridges, Susan M.
    [J]. BMC BIOINFORMATICS, 2009, 10 : S20