An integrated ensemble learning technique for gene expression classification and biomarker identification from RNA-seq data for pancreatic cancer prognosis

被引:2
|
作者
JagadeeswaraRao G. [1 ,2 ]
Sivaprasad A. [3 ]
机构
[1] AUTDRH, Andhra University, Visakhapatnam
[2] Department of IT, Aditya Institute of Technology and Management, Tekkali
[3] Department of Computer Science, Dr. V.S. Krishna Govt. Degree College, Visakhapatnam
关键词
Bio-ML; Biomarkers; Ensemble learning; Pancreatic cancer; RNA-seq; WGCNA;
D O I
10.1007/s41870-023-01688-8
中图分类号
学科分类号
摘要
Machine learning (ML) models are used in the interdisciplinary field of bio-ML to solve biological challenges. The diagnosis and treatment of cancer can benefit from the display of genetic mutations and complex biological process relationships in Ribonucleic acid sequencing (RNA-seq) data. In this paper, we are proposing a bio-ML approach to find gene biomarkers in pancreatic cancer (PC). The pancreatic adenocarcinoma (PAAD) gene expression data was obtained from The Cancer Genome Atlas (TCGA) project database. In our work, we used two methods: one is an ensemble stacking classifier with cross-validation (SCV), which is an ensemble of K-nearest neighbour (KNN), random forest (RF), gradient boosting (GB), and logistic regression (LR) classifiers for effective classification of differentially expressed genes (DEGs); and the second is weighted gene co-expression network analysis (WGCNA) to find the hub gene module. The genes reported from the first and second methods were intersected to find common DEGs. These DEGs were analysed using the PPI network, gene ontology, and pathways to identify the eight hub genes. These hub genes were further evaluated using Gene expression profiling interactive analysis version 2 (GEPIA2), resulting in four novel biomarkers (BUB1, BUB1B, KIF11, and TTK). We believe the integration of the ML approach in biological research is producing encouraging results and aiding in the resolution of challenging issues. © The Author(s), under exclusive licence to Bharati Vidyapeeth's Institute of Computer Applications and Management 2024.
引用
收藏
页码:1505 / 1516
页数:11
相关论文
共 50 条
  • [21] Exploring gene expression levels in Pancreatic Ductal Adenocarcinoma (PDAC) using RNA-Seq data
    Jaiswal, Alokita
    Aier, Imlimaong
    2018 INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND SYSTEMS BIOLOGY (BSB), 2018, : 203 - 206
  • [22] Identifying inaccuracies in gene expression estimates from unstranded RNA-seq data
    Pomaznoy, Mikhail
    Sethi, Ashu
    Greenbaum, Jason
    Peters, Bjoern
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [23] Estimation of gene co-expression from RNA-Seq count data
    Specht, Alicia T.
    Li, Jun
    STATISTICS AND ITS INTERFACE, 2015, 8 (04) : 507 - 515
  • [24] An efficient search algorithm for biomarker selection from RNA-seq prostate cancer data
    Shahbeig, Saleh
    Rahideh, Akbar
    Helfroush, Mohammad Sadegh
    Kazemi, Kamran
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 35 (03) : 3171 - 3180
  • [25] Identifying inaccuracies in gene expression estimates from unstranded RNA-seq data
    Mikhail Pomaznoy
    Ashu Sethi
    Jason Greenbaum
    Bjoern Peters
    Scientific Reports, 9
  • [26] Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data
    Wang, Aiguo
    Liu, Huancheng
    Yang, Jing
    Chen, Guilin
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 142
  • [27] Identification of novel gene fusions by RNA-seq in gastric cancer
    Cho, Geun-A
    Kim, Hwang-Phill
    Shin, Jong-Yeon
    Lee, Won-Cheol
    Yoon, Young-Kwang
    Han, Sae-Won
    Kim, Jong-Il
    Seo, Jeong-Sun
    Kim, Tae-You
    CANCER RESEARCH, 2012, 72
  • [28] Comparison between RNA-Seq and Affymetrix gene expression data
    Fumagalli, D.
    Haibe-Kains, B.
    Michiels, S.
    Brown, D. N.
    Gacquer, D.
    Majjaj, S.
    Salgado, R.
    Larsimont, D.
    Detour, V.
    Piccart, M.
    Sotiriou, C.
    Desmedt, C.
    CANCER RESEARCH, 2012, 72
  • [29] IRIS-EDA: An integrated RNA-Seq interpretation system for gene expression data analysis
    Monier, Brandon
    McDermaid, Adam
    Wang, Cankun
    Zhao, Jing
    Miller, Allison
    Fennell, Anne
    Ma, Qin
    PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (02)
  • [30] Characterization of kinase gene expression and splicing profile in prostate cancer with RNA-Seq data
    Huijuan Feng
    Tingting Li
    Xuegong Zhang
    BMC Genomics, 19