A semi-supervised deep learning method based on stacked sparse auto-encoder for cancer prediction using RNA-seq data

被引:70
|
作者
Xiao, Yawen [1 ,2 ]
Wu, Jun [3 ,4 ]
Lin, Zongli [5 ]
Zhao, Xiaodong [6 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai 200240, Peoples R China
[2] Minist Educ, Key Lab Syst Control & Informat Proc, Shanghai 200240, Peoples R China
[3] East China Normal Univ, Ctr Bioinformat & Computat Biol, Shanghai Key Lab Regulatory Biol, Inst Biomed Sci, Shanghai 200241, Peoples R China
[4] East China Normal Univ, Sch Life Sci, Shanghai 200241, Peoples R China
[5] Univ Virginia, Charles L Brown Dept Elect & Comp Engn, POB 400743, Charlottesville, VA 22904 USA
[6] Shanghai Jiao Tong Univ, Sch Biomed Engn, Shanghai 200240, Peoples R China
关键词
Stacked sparse auto-encoder; Cancer prediction; Gene expression data; Semi-supervised learning; Deep learning; FEATURE-SELECTION; MACHINE; AUTOENCODER; DIAGNOSIS; PROGNOSIS;
D O I
10.1016/j.cmpb.2018.10.004
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background and objective: Cancer has become a complex health problem due to its high mortality. Over the past few decades, with the rapid development of the high-throughput sequencing technology and the application of various machine learning methods, remarkable progress in cancer research has been made based on gene expression data. At the same time, a growing amount of high-dimensional data has been generated, such as RNA-seq data, which calls for superior machine learning methods able to deal with mass data effectively in order to make accurate treatment decision. Methods: In this paper, we present a semi-supervised deep learning strategy, the stacked sparse auto-encoder (SSAE) based classification, for cancer prediction using RNA-seq data. The proposed SSAE based method employs the greedy layer-wise pre-training and a sparsity penalty term to help capture and extract important information from the high-dimensional data and then classify the samples. Results: We tested the proposed SSAE model on three public RNA-seq data sets of three types of cancers and compared the prediction performance with several commonly-used classification methods. The results indicate that our approach outperforms the other methods for all the three cancer data sets in various metrics. Conclusions: The proposed SSAE based semi-supervised deep learning model shows its promising ability to process high-dimensional gene expression data and is proved to be effective and accurate for cancer prediction. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:99 / 105
页数:7
相关论文
共 50 条
  • [41] Improving Prediction of Self-interacting Proteins Using Stacked Sparse Auto-Encoder with PSSM profiles
    Wang, Yan-Bin
    You, Zhu-Hong
    Li, Li-Ping
    Huang, De-Shuang
    Zhou, Feng-Feng
    Yang, Shan
    INTERNATIONAL JOURNAL OF BIOLOGICAL SCIENCES, 2018, 14 (08): : 983 - 991
  • [42] Intrusion detection using deep sparse auto-encoder and self-taught learning
    Qureshi, Aqsa Saeed
    Khan, Asifullah
    Shamim, Nauman
    Durad, Muhammad Hanif
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (08): : 3135 - 3147
  • [43] Semi-supervised fault classification based on dynamic Sparse Stacked auto-encoders model
    Jiang, Li
    Ge, Zhiqiang
    Song, Zhihuan
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2017, 168 : 72 - 83
  • [44] Label propagation-based semi-supervised feature selection on decoding clinical phenotypes with RNA-seq data
    Jiang, Xue
    Chen, Miao
    Song, Weichen
    Lin, Guan Ning
    BMC MEDICAL GENOMICS, 2021, 14 (SUPPL 1)
  • [45] Label propagation-based semi-supervised feature selection on decoding clinical phenotypes with RNA-seq data
    Xue Jiang
    Miao Chen
    Weichen Song
    Guan Ning Lin
    BMC Medical Genomics, 14
  • [46] Semi-supervised classification using sparse representation for cancer recurrence prediction
    Cui, Yan
    Cai, Xiaodong
    Jin, Zhong
    2013 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS (GENSIPS 2013), 2013, : 102 - 105
  • [47] Analysis of RNA-Seq data using self-supervised learning for vital status prediction of colorectal cancer patients
    Padegal, Girivinay
    Rao, Murali Krishna
    Ravishankar, Om Amitesh Boggaram
    Acharya, Sathwik
    Athri, Prashanth
    Srinivasa, Gowri
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [48] Analysis of RNA-Seq data using self-supervised learning for vital status prediction of colorectal cancer patients
    Girivinay Padegal
    Murali Krishna Rao
    Om Amitesh Boggaram Ravishankar
    Sathwik Acharya
    Prashanth Athri
    Gowri Srinivasa
    BMC Bioinformatics, 24
  • [49] Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM
    Alvarez, Marcus
    Rahmani, Elior
    Jew, Brandon
    Garske, Kristina M.
    Miao, Zong
    Benhammou, Jihane N.
    Ye, Chun Jimmie
    Pisegna, Joseph R.
    Pietilainen, Kirsi H.
    Halperin, Eran
    Pajukanta, Paivi
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [50] Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM
    Marcus Alvarez
    Elior Rahmani
    Brandon Jew
    Kristina M. Garske
    Zong Miao
    Jihane N. Benhammou
    Chun Jimmie Ye
    Joseph R. Pisegna
    Kirsi H. Pietiläinen
    Eran Halperin
    Päivi Pajukanta
    Scientific Reports, 10