Using Supervised Learning Methods for Gene Selection in RNA-Seq Case-Control Studies

被引:33
|
作者
Wenric, Stephane [1 ,2 ]
Shemirani, Ruhollah [3 ]
机构
[1] Univ Liege, GIGA Res, Lab Human Genet, Liege, Belgium
[2] Mt Sinai Hosp, Icahn Sch Med, Charles Bronfman Inst Personalized Med, Dept Genet & Genom Sci, New York, NY 10029 USA
[3] Univ Southern Calif, Informat Sci Inst, Dept Comp Sci, Marina Del Rey, CA USA
关键词
RNA-Seq; supervised learning; random forests; variational autoencoders; gene selection; feature selection; transcriptomics; gene expression; CANCER; EXPRESSION; EVOLUTION; RECEPTOR; GROWTH; IGF-1; TOOL;
D O I
10.3389/fgene.2018.00297
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Whole transcriptome studies typically yield large amounts of data, with expression values for all genes or transcripts of the genome. The search for genes of interest in a particular study setting can thus be a daunting task, usually relying on automated computational methods Moreover, most biological questions imply that such a search should be performed in a multivariate setting, to take into account the inter-genes relationships. Differential expression analysis commonly yields large lists of genes deemed significant, even after adjustment for multiple testing, making the subsequent study possibilities extensive. Here, we explore the use of supervised learning methods to rank large ensembles of genes defined by their expression values measured with RNA-Seq in a typical 2 classes sample set. First, we use one of the variable importance measures generated by the random forests classification algorithm as a metric to rank genes Second, we define the EPS (extreme pseudo-samples) pipeline, making use of VAEs (Variational Autoencoders) and regressors to extract a ranking of genes while leveraging the feature space of both virtual and comparable samples We show that, on 12 cancer RNA-Seq data sets ranging from 323 to 1,210 samples, using either a random forests-based gene selection method or the EPS pipeline outperforms differential expression analysis for 9 and 8 out of the 12 datasets respectively, in terms of identifying subsets of genes associated with survival These results demonstrate the potential of supervised learning-based gene selection methods in RNA-Seq studies and highlight the need to use such multivariate gene selection methods alongside the widely used differential expression analysis.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] ChIP-seq and RNA-seq Methods to Study Circadian Control of Transcription in Mammals
    Takahashi, Joseph S.
    Kumar, Vivek
    Nakashe, Prachi
    Koike, Nobuya
    Huang, Hung-Chung
    Green, Carla B.
    Kim, Tae-Kyung
    CIRCADIAN RHYTHMS AND BIOLOGICAL CLOCKS, PT A, 2015, 551 : 285 - 321
  • [22] Methods for Quantifying Gene Expression in Ecoimmunology: From qPCR to RNA-Seq
    Fassbinder-Orth, Carol A.
    INTEGRATIVE AND COMPARATIVE BIOLOGY, 2014, 54 (03) : 396 - 406
  • [23] Comparing Bioinformatic Gene Expression Profiling Methods: Microarray and RNA-Seq
    Mantione, Kirk J.
    Kream, Richard M.
    Kuzelova, Hana
    Ptacek, Radek
    Raboch, Jiri
    Samuel, Joshua M.
    Stefano, George B.
    MEDICAL SCIENCE MONITOR BASIC RESEARCH, 2014, 20 : 138 - 141
  • [24] Computational methods for transcriptome annotation and quantification using RNA-seq
    Garber, Manuel
    Grabherr, Manfred G.
    Guttman, Mitchell
    Trapnell, Cole
    NATURE METHODS, 2011, 8 (06) : 469 - 477
  • [25] Computational methods for transcriptome annotation and quantification using RNA-seq
    Garber M.
    Grabherr M.G.
    Guttman M.
    Trapnell C.
    Nature Methods, 2011, 8 (6) : 469 - 477
  • [26] Bayesian Variable Selection Methods for Matched Case-Control Studies
    Asafu-Adjei, Josephine
    Tadesse, Mahlet G.
    Coull, Brent
    Balasubramanian, Raji
    Lev, Michael
    Schwamm, Lee
    Betensky, Rebecca
    INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2017, 13 (01):
  • [27] Transcriptome Analysis of Psoriasis in a Large Case-Control Sample: RNA-Seq Provides Insights into Disease Mechanisms
    Li, Bingshan
    Tsoi, Lam C.
    Swindell, William R.
    Gudjonsson, Johann E.
    Tejasvi, Trilokraj
    Johnston, Andrew
    Ding, Jun
    Stuart, Philip E.
    Xing, Xianying
    Kochkodan, James J.
    Voorhees, John J.
    Kang, Hyun M.
    Nair, Rajan P.
    Abecasis, Goncalo R.
    Elder, James T.
    JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2014, 134 (07) : 1828 - 1838
  • [28] Gene selection using hybrid dragonfly black hole algorithm: A case study on RNA-seq COVID-19 data
    Pashaei, Elnaz
    Pashaei, Elham
    ANALYTICAL BIOCHEMISTRY, 2021, 627
  • [29] Finding the active genes in deep RNA-seq gene expression studies
    Hart, Traver
    Komori, H. Kiyomi
    LaMere, Sarah
    Podshivalova, Katie
    Salomon, Daniel R.
    BMC GENOMICS, 2013, 14
  • [30] Finding the active genes in deep RNA-seq gene expression studies
    Traver Hart
    H Kiyomi Komori
    Sarah LaMere
    Katie Podshivalova
    Daniel R Salomon
    BMC Genomics, 14