Deep Learning-based Identification of Cancer or Normal Tissue using Gene Expression Data

被引:0
|
作者
Ahn, TaeJin [1 ]
Goo, Taewan [1 ,2 ]
Lee, Chan-hee [1 ]
Kim, SungMin [1 ]
Han, Kyullhee [1 ]
Park, Sangick [1 ]
Park, Taesung [3 ]
机构
[1] Handong Global Univ, Dept Life Sci, Pohang, South Korea
[2] Seoul Natl Univ, Interdisciplinary Program Bioinformat, Seoul, South Korea
[3] Seoul Natl Univ, Dept Stat, Seoul, South Korea
关键词
cancer; deep learning; gene expression; oncogene; addiction;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: Deep learning has proven to show outstanding performance in resolving recognition and classification problems. As increasing amounts of cancer and normal gene expression data become publicly available, deep learning may become an integral component of efficiently finding specific patterns within massive datasets. Thus, we aim to address the extent to which the machine can learn to recognize cancer. We integrated cancer and normal tissue data from the Gene Expression Omnibus (GEO), The Cancer Gene Atlas (TCGA), Therapeutically Applicable Research To Generate Effective Treatments (TARGET), and Genotype-Tissue Expression (GTEx) databases, including 13,406 cancer and 12,842 normal gene expression data from 24 different tissues. We first trained the deep neural network (DNN) to discriminate between cancer and normal samples using various gene selection strategies and therapeutic target genes from commercial cancer panels and genes in NCI-curated cancer pathways. We also suggest systemic analyzation method to interpret trained deep neural network. We applied the method to find genes mostly contribute to classify cancer in an individual sample. Result: The best trained DNN could classify cancer and normal data with accuracy of 0.997 in the training data set of 13,123 (cancer: 6,703, normal: 6,402) samples. In the independent test set comprising 13,125 (cancer: 6,703, normal: 6,422) samples, the DNN model achieved 0.979 accuracy. Using the same training and test data, our DNN showed better performance than other conventional prediction methods, followed by the support vector machine approach. For interpretation, we propose a method that can extract a gene's contribution to an individual sample's cancer probability from the trained DNN. This method distinguished samples dependent on one or a few genes suggesting these samples are possibly "oncogene addicted". Conclusion: A deep learning approach in conjunction with our interpretation method is not only a useful tool to identify cancer from gene expression data but can also contribute toward understanding the complex nature of cancer based on large public data.
引用
收藏
页码:1748 / 1752
页数:5
相关论文
共 50 条
  • [1] Deep learning-based classification and interpretation of gene expression data from cancer and normal tissues
    Ahn, TaeJin
    Goo, Taewan
    Lee, Chan-Hee
    Kim, SungMin
    Han, Kyullhee
    Park, Sangick
    Park, Taesung
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2020, 24 (02) : 121 - 139
  • [2] Cancer Classification Based on Microarray Gene Expression Data Using Deep Learning
    Guillen, Pablo
    Ebalunode, Jerry
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE & COMPUTATIONAL INTELLIGENCE (CSCI), 2016, : 1403 - 1405
  • [3] Deep Learning-Based Prediction of Alzheimer's Disease Using Microarray Gene Expression Data
    Abdelwahab, Mahmoud M.
    Al-Karawi, Khamis A.
    Semary, Hatem E.
    Gulyaeva, Natalia V.
    [J]. BIOMEDICINES, 2023, 11 (12)
  • [4] An effective deep learning-based approach for splice site identification in gene expression
    Ali, Mohsin
    Shah, Dilawar
    Qazi, Shahid
    Khan, Izaz Ahmad
    Abrar, Mohammad
    Zahir, Sana
    [J]. SCIENCE PROGRESS, 2024, 107 (03)
  • [5] Deep learning-based ovarian cancer subtypes identification using multi-omics data
    Long-Yi Guo
    Ai-Hua Wu
    Yong-xia Wang
    Li-ping Zhang
    Hua Chai
    Xue-Fang Liang
    [J]. BioData Mining, 13
  • [6] Deep learning-based ovarian cancer subtypes identification using multi-omics data
    Guo, Long-Yi
    Wu, Ai-Hua
    Wang, Yong-xia
    Zhang, Li-ping
    Chai, Hua
    Liang, Xue-Fang
    [J]. BIODATA MINING, 2020, 13 (01)
  • [7] Learning-based segmentation framework for tissue images containing gene expression data
    Bello, Musodiq
    Ju, Tao
    Carson, James
    Warren, Joe
    Chiu, Wah
    Kakadiaris, Ioannis A.
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2007, 26 (05) : 728 - 744
  • [8] Lung cancer classification based on enhanced deep learning using gene expression data
    Yuvaraj, V.
    Maheswari, D.
    [J]. Measurement: Sensors, 2023, 30
  • [9] Deep-Learning-Based Cancer Profiles Classification Using Gene Expression Data Profile
    Almarzouki, Hatim Z.
    [J]. JOURNAL OF HEALTHCARE ENGINEERING, 2022, 2022
  • [10] Deep-Learning-Based Cancer Profiles Classification Using Gene Expression Data Profile
    Almarzouki, Hatim Z.
    [J]. JOURNAL OF HEALTHCARE ENGINEERING, 2022, 2022