An integrated ensemble learning technique for gene expression classification and biomarker identification from RNA-seq data for pancreatic cancer prognosis

被引:2
|
作者
JagadeeswaraRao G. [1 ,2 ]
Sivaprasad A. [3 ]
机构
[1] AUTDRH, Andhra University, Visakhapatnam
[2] Department of IT, Aditya Institute of Technology and Management, Tekkali
[3] Department of Computer Science, Dr. V.S. Krishna Govt. Degree College, Visakhapatnam
关键词
Bio-ML; Biomarkers; Ensemble learning; Pancreatic cancer; RNA-seq; WGCNA;
D O I
10.1007/s41870-023-01688-8
中图分类号
学科分类号
摘要
Machine learning (ML) models are used in the interdisciplinary field of bio-ML to solve biological challenges. The diagnosis and treatment of cancer can benefit from the display of genetic mutations and complex biological process relationships in Ribonucleic acid sequencing (RNA-seq) data. In this paper, we are proposing a bio-ML approach to find gene biomarkers in pancreatic cancer (PC). The pancreatic adenocarcinoma (PAAD) gene expression data was obtained from The Cancer Genome Atlas (TCGA) project database. In our work, we used two methods: one is an ensemble stacking classifier with cross-validation (SCV), which is an ensemble of K-nearest neighbour (KNN), random forest (RF), gradient boosting (GB), and logistic regression (LR) classifiers for effective classification of differentially expressed genes (DEGs); and the second is weighted gene co-expression network analysis (WGCNA) to find the hub gene module. The genes reported from the first and second methods were intersected to find common DEGs. These DEGs were analysed using the PPI network, gene ontology, and pathways to identify the eight hub genes. These hub genes were further evaluated using Gene expression profiling interactive analysis version 2 (GEPIA2), resulting in four novel biomarkers (BUB1, BUB1B, KIF11, and TTK). We believe the integration of the ML approach in biological research is producing encouraging results and aiding in the resolution of challenging issues. © The Author(s), under exclusive licence to Bharati Vidyapeeth's Institute of Computer Applications and Management 2024.
引用
收藏
页码:1505 / 1516
页数:11
相关论文
共 50 条
  • [1] Analyzing RNA-Seq Gene Expression Data Using Deep Learning Approaches for Cancer Classification
    Rukhsar, Laiqa
    Bangyal, Waqas Haider
    Ali Khan, Muhammad Sadiq
    Ag Ibrahim, Ag Asri
    Nisar, Kashif
    Rawat, Danda B.
    APPLIED SCIENCES-BASEL, 2022, 12 (04):
  • [2] Identification of prognosis markers for endometrial cancer by integrated analysis of DNA methylation and RNA-Seq data
    Huo, Xiao
    Sun, Hengzi
    Cao, Dongyan
    Yang, Jiaxin
    Peng, Peng
    Yu, Mei
    Shen, Keng
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [3] Identification of prognosis markers for endometrial cancer by integrated analysis of DNA methylation and RNA-Seq data
    Xiao Huo
    Hengzi Sun
    Dongyan Cao
    Jiaxin Yang
    Peng Peng
    Mei Yu
    Keng Shen
    Scientific Reports, 9
  • [4] Deep Learning to Analyze RNA-Seq Gene Expression Data
    Urda, D.
    Montes-Torres, J.
    Moreno, F.
    Franco, L.
    Jerez, J. M.
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2017, PT II, 2017, 10306 : 50 - 59
  • [5] Analyzing RNA-Seq Gene Expression Data for Cancer Classification Through ML Approach
    Wahid, Abdul
    Banday, M. Tariq
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 798 - 810
  • [6] Hybrid Causal Feature Selection for Cancer Biomarker Identification From RNA-Seq Data
    Xu, Wenwei
    Zhang, Hao
    Xia, Yewei
    Ren, Yixin
    Guan, Jihong
    Zhou, Shuigeng
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2024, 21 (06) : 1645 - 1655
  • [7] Integrated Analysis of Microarray and RNA-Seq Data for the Identification of Hub Genes and Networks Involved in the Pancreatic Cancer
    Nisar, Maryum
    Paracha, Rehan Zafar
    Arshad, Iqra
    Adil, Sidra
    Zeb, Sabaoon
    Hanif, Rumeza
    Rafiq, Mehak
    Hussain, Zamir
    FRONTIERS IN GENETICS, 2021, 12
  • [8] Unsupervised feature selection algorithm for multiclass cancer classification of gene expression RNA-Seq data
    Garcia-Diaz, Pilar
    Sanchez-Berriel, Isabel
    Martinez-Rojas, Juan A.
    Diez-Pascual, Ana M.
    GENOMICS, 2020, 112 (02) : 1916 - 1925
  • [9] PanClassif: Improving pan cancer classification of single cell RNA-seq gene expression data using machine learning
    Mahin, Kazi Ferdous
    Robiuddin, Md
    Islam, Mujahidul
    Ashraf, Shayed
    Yeasmin, Farjana
    Shatabda, Swakkhar
    GENOMICS, 2022, 114 (02)
  • [10] Biomarker Identification from RNA-Seq Data using a Robust Statistical Approach
    Akond, Zobaer
    Alam, Munirul
    Mollah, Md. Nurul Haque
    BIOINFORMATION, 2018, 14 (04) : 153 - 163