Colon cancer diagnosis and staging classification based on machine learning and bioinformatics analysis

被引:50
|
作者
Su, Ying [1 ]
Tian, Xuecong [1 ]
Gao, Rui [3 ]
Guo, Wenjia [2 ]
Chen, Cheng [1 ]
Chen, Chen [3 ,4 ]
Jia, Dongfang [1 ]
Li, Hongtao [2 ]
Lv, Xiaoyi [1 ,5 ]
机构
[1] Xinjiang Univ, Coll Software, Urumqi 830046, Xinjiang, Peoples R China
[2] Xinjiang Med Univ, Affiliated Tumor Hosp, Urumqi 830011, Peoples R China
[3] Xinjiang Med Univ, Coll Informat Sci & Engn, Urumqi 830046, Peoples R China
[4] Cloud Comp Engn Technol Res Ctr Xinjiang, Kelamayi 834099, Peoples R China
[5] Xinjiang Univ, Key Lab Signal Detect & Proc, Urumqi 830046, Xinjiang, Peoples R China
关键词
Machine learning; Colon cancer; Prognosis; WGCNA; Staging; PPI; GENE-EXPRESSION;
D O I
10.1016/j.compbiomed.2022.105409
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Advanced metastasis of colon cancer makes it more difficult to treat colon cancer. Finding the markers of colon cancer (Colon Cancer) can diagnose the stage of cancer in time and improve the prognosis with timely treatment. This paper uses gene expression profiling data from The Cancer Genome Atlas (TCGA) for the diagnosis of colon cancer and its staging. In this study, we first selected the gene modules with the greatest correlation with cancer by Weighted Gene Co-expression Network Analysis (WGCNA), extracted the characteristic genes for differential expression results using the least absolute shrinkage and selection operator algorithm (Lasso) and performed survival analysis, and then combined the genes in the modules with the Lasso-extracted feature genes were combined to diagnose colon cancer versus healthy controls using RF, SVM and decision trees, and colon cancer staging was diagnosed using differentially expressed genes for each stage. Finally, Protein-Protein Interaction Networks (PPI) networks were done for 289 genes to identify clusters of aggregated proteins for survival analysis. Finally, the RF model had the best results in the diagnosis of colon cancer versus control group fold cross validation with an average accuracy of 99.81%, F1 value reaching 0.9968, accuracy of 99.88%, and recall of 99.5%, and an average accuracy of 91.5%, F1 value reaching 0.7679, accuracy of 86.94%, and recall in the diagnosis of colon cancer stages I, II, III and IV. The recall rate reached 73.04%, and eight genes associated with colon cancer prognosis were identified for GCNT2, GLDN, SULT1B1, UGT2B15, PTGDR2, GPR15, BMP5 and CPT2.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Colon Cancer Diagnosis Based on Machine Learning and Deep Learning: Modalities and Analysis Techniques
    Tharwat, Mai
    Sakr, Nehal A. A.
    El-Sappagh, Shaker
    Soliman, Hassan
    Kwak, Kyung-Sup
    Elmogy, Mohammed
    [J]. SENSORS, 2022, 22 (23)
  • [2] Classification of bioinformatics dataset using finite impulse response extreme learning machine for cancer diagnosis
    Lee, Kevin
    Man, Zhihong
    Wang, Dianhui
    Cao, Zhenwei
    [J]. NEURAL COMPUTING & APPLICATIONS, 2013, 22 (3-4): : 457 - 468
  • [3] Classification of bioinformatics dataset using finite impulse response extreme learning machine for cancer diagnosis
    Kevin Lee
    Zhihong Man
    Dianhui Wang
    Zhenwei Cao
    [J]. Neural Computing and Applications, 2013, 22 : 457 - 468
  • [4] Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning
    Hammad, Ahmed
    Elshaer, Mohamed
    Tang, Xiuwen
    [J]. MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2021, 18 (06) : 8997 - 9015
  • [5] Quantitative digital image analysis and machine learning for staging of prostate cancer at diagnosis.
    Huang, Fangjin
    Ing, Nathan
    Eric, Miller
    Salemi, Hootan
    Lewis, Michael
    Garraway, Isla
    Gertych, Arkadiusz
    Knudsen, Beatrice
    [J]. CANCER RESEARCH, 2018, 78 (16) : 130 - 130
  • [6] A Bioinformatics Analysis of Ovarian Cancer Data Using Machine Learning
    Schilling, Vincent
    Beyerlein, Peter
    Chien, Jeremy
    [J]. ALGORITHMS, 2023, 16 (07)
  • [7] A Machine Learning Approach to Diagnosing Lung and Colon Cancer Using a Deep Learning-Based Classification Framework
    Masud, Mehedi
    Sikder, Niloy
    Nahid, Abdullah-Al
    Bairagi, Anupam Kumar
    AlZain, Mohammed A.
    [J]. SENSORS, 2021, 21 (03) : 1 - 21
  • [8] Extreme learning machine based approach for diagnosis and analysis of breast cancer
    Malik, Ahsan
    Iqbal, Jamshed
    [J]. JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2016, 39 (01) : 74 - 78
  • [9] Identification of Biomarkers Associated with Diagnosis of Osteoarthritis Patients Based on Bioinformatics and Machine Learning
    Liang, Yihao
    Lin, Fangzheng
    Huang, Yunfei
    [J]. JOURNAL OF IMMUNOLOGY RESEARCH, 2022, 2022
  • [10] A new staging classification of colon cancer with peritoneal dissemination
    Pelz, J.
    Esquivel, J.
    [J]. ONKOLOGIE, 2008, 31 : 59 - 60