A Topological Data Analysis Approach on Predicting Phenotypes from Gene Expression Data

被引:10
|
作者
Mandal, Sayan [1 ]
Guzman-Saenz, Aldo [2 ]
Haiminen, Niina [2 ]
Basu, Saugata [3 ]
Parida, Laxmi [2 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] TJ Watson Res Ctr, IBM Res, Yorktown Hts, NY 10598 USA
[3] Purdue Univ, W Lafayette, IN 47907 USA
关键词
Topological data analysis; Gene expression; Phenotype prediction; Parkinson's disease; PERSISTENT HOMOLOGY;
D O I
10.1007/978-3-030-42266-0_14
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The goal of this study was to investigate if gene expression measured from RNA sequencing contains enough signal to separate healthy and afflicted individuals in the context of phenotype prediction. We observed that standard machine learning methods alone performed somewhat poorly on the disease phenotype prediction task; therefore we devised an approach augmenting machine learning with topological data analysis. We describe a framework for predicting phenotype values by utilizing gene expression data transformed into sample-specific topological signatures by employing feature subsampling and persistent homology. The topological data analysis approach developed in this work yielded improved results on Parkinson's disease phenotype prediction when measured against standard machine learning methods. This study confirms that gene expression can be a useful indicator of the presence or absence of a condition, and the subtle signal contained in this high dimensional data reveals itself when considering the intricate topological connections between expressed genes.
引用
收藏
页码:178 / 187
页数:10
相关论文
共 50 条
  • [31] Integrating phenotype and gene expression data for predicting gene function
    Brandon M Malone
    Andy D Perkins
    Susan M Bridges
    [J]. BMC Bioinformatics, 10
  • [32] Microarray data analysis: From hypotheses to conclusions using gene expression data
    Armstrong, NJ
    van de Wiel, MA
    [J]. CELLULAR ONCOLOGY, 2004, 26 (5-6) : 279 - 290
  • [33] Topological data analysis identifies molecular phenotypes of idiopathic pulmonary fibrosis
    Shapanis, Andrew
    Jones, Mark G.
    Schofield, James
    Skipp, Paul
    [J]. THORAX, 2023, 78 (07) : 682 - 689
  • [34] Predicting proteome dynamics using gene expression data
    Kuchta, Krzysztof
    Towpik, Joanna
    Biernacka, Anna
    Kutner, Jan
    Kudlicki, Andrzej
    Ginalski, Krzysztof
    Rowicka, Maga
    [J]. SCIENTIFIC REPORTS, 2018, 8
  • [35] Predicting proteome dynamics using gene expression data
    Krzysztof Kuchta
    Joanna Towpik
    Anna Biernacka
    Jan Kutner
    Andrzej Kudlicki
    Krzysztof Ginalski
    Maga Rowicka
    [J]. Scientific Reports, 8
  • [36] Gene Expression Data Analysis Using a Novel Approach to Biclustering Combining Discrete and Continuous Data
    Christinat, Yann
    Wachmann, Bernd
    Zhang, Lei
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2008, 5 (04) : 583 - 593
  • [37] An improved scoring scheme for predicting glycan structures from gene expression data
    Suga, Akitsugu
    Yamanishi, Yoshihiro
    Hashimoto, Kosuke
    Goto, Susumu
    Kanehisa, Minoru
    [J]. GENOME INFORMATICS 2007, VOL 18, 2007, 18 : 237 - 246
  • [38] Computational Methods for Predicting Autism Spectrum Disorder from Gene Expression Data
    Zhang, Junpeng
    Thin Nguyen
    Buu Truong
    Liu, Lin
    Li, Jiuyong
    Thuc Duy Le
    [J]. ADVANCED DATA MINING AND APPLICATIONS, 2020, 12447 : 395 - 409
  • [39] An Ensemble Approach for Gene Selection in Gene Expression Data
    Castellanos-Garzon, Jose A.
    Ramos, Juan
    Lopez-Sanchez, Daniel
    de Paz, Juan F.
    [J]. 11TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS, 2017, 616 : 237 - 247
  • [40] An effective fuzzy kernel clustering analysis approach for gene expression data
    Sun, Lin
    Xu, Jiucheng
    Yin, Jiaojiao
    [J]. BIO-MEDICAL MATERIALS AND ENGINEERING, 2015, 26 : S1863 - S1869