Assessment of performance of survival prediction models for cancer prognosis

被引:54
|
作者
Chen, Hung-Chia [1 ]
Kodell, Ralph L. [3 ]
Cheng, Kuang Fu [2 ]
Chen, James J. [1 ,2 ]
机构
[1] US FDA, Natl Ctr Toxicol Res, Div Bioinformat & Biostat, Jefferson, AR 72079 USA
[2] China Med Univ, Sch Publ Hlth, Ctr Biostat, Taichung, Taiwan
[3] Univ Arkansas Med Sci, Dept Biostat, Little Rock, AR 72205 USA
来源
关键词
CELL-LUNG-CANCER; GENE-EXPRESSION SIGNATURES; HIGH-DIMENSIONAL DATA; CROSS-VALIDATION; BREAST-CANCER; RISK STRATIFICATION; MICROARRAY DATA; LYMPHOMA; CLASSIFIERS; RECURRENCE;
D O I
10.1186/1471-2288-12-102
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Cancer survival studies are commonly analyzed using survival-time prediction models for cancer prognosis. A number of different performance metrics are used to ascertain the concordance between the predicted risk score of each patient and the actual survival time, but these metrics can sometimes conflict. Alternatively, patients are sometimes divided into two classes according to a survival-time threshold, and binary classifiers are applied to predict each patient's class. Although this approach has several drawbacks, it does provide natural performance metrics such as positive and negative predictive values to enable unambiguous assessments. Methods: We compare the survival-time prediction and survival-time threshold approaches to analyzing cancer survival studies. We review and compare common performance metrics for the two approaches. We present new randomization tests and cross-validation methods to enable unambiguous statistical inferences for several performance metrics used with the survival-time prediction approach. We consider five survival prediction models consisting of one clinical model, two gene expression models, and two models from combinations of clinical and gene expression models. Results: A public breast cancer dataset was used to compare several performance metrics using five prediction models. 1) For some prediction models, the hazard ratio from fitting a Cox proportional hazards model was significant, but the two-group comparison was insignificant, and vice versa. 2) The randomization test and cross-validation were generally consistent with the p-values obtained from the standard performance metrics. 3) Binary classifiers highly depended on how the risk groups were defined; a slight change of the survival threshold for assignment of classes led to very different prediction results. Conclusions: 1) Different performance metrics for evaluation of a survival prediction model may give different conclusions in its discriminatory ability. 2) Evaluation using a high-risk versus low-risk group comparison depends on the selected risk-score threshold; a plot of p-values from all possible thresholds can show the sensitivity of the threshold selection. 3) A randomization test of the significance of Somers' rank correlation can be used for further evaluation of performance of a prediction model. 4) The cross-validated power of survival prediction models decreases as the training and test sets become less balanced.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Gastric Cancer and Prediction of Prognosis
    Huang, Baojun
    ANNALS OF SURGICAL ONCOLOGY, 2009, 16 (05) : 1434 - 1434
  • [32] Gastric Cancer and Prediction of Prognosis
    Baojun Huang
    Annals of Surgical Oncology, 2009, 16 : 1434 - 1434
  • [33] The prediction models for postoperative overall survival and disease-free survival in patients with breast cancer
    Shigemizu, Daichi
    Iwase, Takuji
    Yoshimoto, Masataka
    Suzuki, Yasuyo
    Miya, Fuyuki
    Boroevich, Keith A.
    Katagiri, Toyomasa
    Zembutsu, Hitoshi
    Tsunoda, Tatsuhiko
    CANCER MEDICINE, 2017, 6 (07): : 1627 - 1638
  • [34] survcomp: an R/Bioconductor package for performance assessment and comparison of survival models
    Schroeder, Markus S.
    Culhane, Aedin C.
    Quackenbush, John
    Haibe-Kains, Benjamin
    BIOINFORMATICS, 2011, 27 (22) : 3206 - 3208
  • [35] Performance of radiomics models for survival prediction in non-small-cell lung cancer: influence of CT slice thickness
    Sohee Park
    Sang Min Lee
    Seonok Kim
    Sehoon Choi
    Wooil Kim
    Kyung-Hyun Do
    Joon Beom Seo
    European Radiology, 2021, 31 : 2856 - 2865
  • [36] Performance of radiomics models for survival prediction in non-small-cell lung cancer: influence of CT slice thickness
    Park, Sohee
    Lee, Sang Min
    Kim, Seonok
    Choi, Sehoon
    Kim, Wooil
    Do, Kyung-Hyun
    Seo, Joon Beom
    EUROPEAN RADIOLOGY, 2021, 31 (05) : 2856 - 2865
  • [37] Review and evaluation of performance measures for survival prediction models in external validation settings
    M. Shafiqur Rahman
    Gareth Ambler
    Babak Choodari-Oskooei
    Rumana Z. Omar
    BMC Medical Research Methodology, 17
  • [38] Review and evaluation of performance measures for survival prediction models in external validation settings
    Rahman, M. Shafiqur
    Ambler, Gareth
    Choodari-Oskooei, Babak
    Omar, Rumana Z.
    BMC MEDICAL RESEARCH METHODOLOGY, 2017, 17
  • [39] Comparative study of machine learning and statistical survival models for enhancing cervical cancer prognosis and risk factor assessment using SEER data
    Kolasseri, Anjana Eledath
    Venkataramana, B.
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [40] A performance assessment of Bayesian networks as a predictor of breast cancer survival
    Moore, A
    Hoang, A
    COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2002, : 3 - 8