Predictors of colorectal cancer survival using cox regression and random survival forests models based on gene expression data

被引:15
|
作者
Mohammed, Mohanad [1 ,2 ]
Mboya, Innocent B. [1 ,3 ]
Mwambi, Henry [1 ]
Elbashir, Murtada K. [4 ]
Omolo, Bernard [1 ,5 ,6 ]
机构
[1] Univ KwaZulu Natal, Sch Math Stat & Comp Sci, Pietermaritzburg, South Africa
[2] Univ Gezira, Fac Math & Comp Sci, Wad Madani, Sudan
[3] Kilimanjaro Christian Med Univ Coll KCMUCo, Dept Epidemiol & Biostat, Moshi, Tanzania
[4] Jouf Univ, Coll Comp & Informat Sci, Sakaka, Saudi Arabia
[5] Univ South Carolina Upstate, Div Math & Comp Sci, Spartanburg, SC USA
[6] Univ Witwatersrand, Fac Hlth Sci, Sch Publ Hlth, Johannesburg, South Africa
来源
PLOS ONE | 2021年 / 16卷 / 12期
关键词
MULTIPLE IMPUTATION; BIOMARKERS;
D O I
10.1371/journal.pone.0261625
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Understanding and identifying the markers and clinical information that are associated with colorectal cancer (CRC) patient survival is needed for early detection and diagnosis. In this work, we aimed to build a simple model using Cox proportional hazards (PH) and random survival forest (RSF) and find a robust signature for predicting CRC overall survival. We used stepwise regression to develop Cox PH model to analyse 54 common differentially expressed genes from three mutations. RSF is applied using log-rank and log-rank-score based on 5000 survival trees, and therefore, variables important obtained to find the genes that are most influential for CRC survival. We compared the predictive performance of the Cox PH model and RSF for early CRC detection and diagnosis. The results indicate that SLC9A8, IER5, ARSJ, ANKRD27, and PIPOX genes were significantly associated with the CRC overall survival. In addition, age, sex, and stages are also affecting the CRC overall survival. The RSF model using log-rank is better than log-rank-score, while log-rank-score needed more trees to stabilize. Overall, the imputation of missing values enhanced the model's predictive performance. In addition, Cox PH predictive performance was better than RSF.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] Visualising survival data regression models using pseudo-observations
    Perme, Maja Pohar
    Andersen, Per Kragh
    PROCEEDINGS OF THE ITI 2008 30TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2008, : 377 - +
  • [42] Pathway-gene identification for pancreatic cancer survival via doubly regularized Cox regression
    Gong, Haijun
    Wu, Tong Tong
    Clarke, Edmund M.
    BMC SYSTEMS BIOLOGY, 2014, 8 : S3
  • [43] Comparison of semiparametric regression models for correlated survival data using simulations
    Lorino, T
    Sanaa, M
    Robin, S
    Daudin, JJ
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2004, 33 (08) : 1975 - 1991
  • [44] Optimal microRNA Sequencing Depth to Predict Cancer Patient Survival with Random Forest and Cox Models
    Jardillier, Remy
    Koca, Dzenis
    Chatelain, Florent
    Guyon, Laurent
    GENES, 2022, 13 (12)
  • [45] Comparison of Bayesian survival analysis and Cox regression analysis in simulated and breast cancer data sets
    Omurlu, Imran Kurt
    Ozdamar, Kazim
    Ture, Mevlut
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (08) : 11341 - 11346
  • [46] Prediction of disease-free survival in gastric cancer using gene expression data
    Boussioutas, A
    Van Laar, R
    Desmond, P
    Bowtell, D
    GASTROENTEROLOGY, 2003, 124 (04) : A554 - A555
  • [47] Modelling population-based cancer survival trends by using join point models for grouped survival data
    Yu, Binbing
    Huang, Lan
    Tiwari, Ram C.
    Feuer, Eric J.
    Johnson, Karen A.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2009, 172 : 405 - 425
  • [48] Survival ensemble with sparse random projections for censored clinical and gene expression data
    Zhou L.
    Wang H.
    Xu Q.
    Wang, Hong (wh@csu.edu.cn), 1600, Information Processing Society of Japan (09): : 18 - 23
  • [49] Classification of miRNA Expression Data Using Random Forests for Cancer Diagnosis
    Razak, Eliza
    Yusorf, Faridah
    Raus, Raha Ahmad
    PROCEEDINGS OF 6TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING (ICCCE 2016), 2016, : 187 - 190
  • [50] Comparing parametric and Cox regression models using HIV/AIDS survival data from a retrospective study in Ntcheu district in Malawi
    Dzinza, Rabson
    Ngwira, Alfred
    JOURNAL OF PUBLIC HEALTH RESEARCH, 2022, 11 (03)