External validation: a simulation study to compare cross-validation versus holdout or external testing to assess the performance of clinical prediction models using PET data from DLBCL patients

被引:21
|
作者
Eertink, Jakoba J. [1 ,2 ]
Heymans, Martijn W. [3 ,4 ]
Zwezerijnen, Gerben J. C. [2 ,5 ]
Zijlstra, Josee M. [1 ,2 ]
de Vet, Henrica C. W. [3 ,4 ]
Boellaard, Ronald [2 ,5 ]
机构
[1] Amsterdam UMC Locat Vrije Univ Amsterdam, Dept Hematol, De Boelelaan 1117, NL-1081 HV Amsterdam, Netherlands
[2] Canc Ctr Amsterdam, Imaging & Biomarkers, Amsterdam, Netherlands
[3] Amsterdam UMC Locat Vrije Univ Amsterdam, Epidemiol & Data Sci, Amsterdam, Netherlands
[4] Amsterdam Publ Hlth Res Inst, Methodol, Amsterdam, Netherlands
[5] Amsterdam UMC Locat Vrije Univ Amsterdam, Radiol & Nucl Med, Amsterdam, Netherlands
关键词
Internal validation; External validation; Model performance; CV-AUC;
D O I
10.1186/s13550-022-00931-w
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Aim Clinical prediction models need to be validated. In this study, we used simulation data to compare various internal and external validation approaches to validate models. Methods Data of 500 patients were simulated using distributions of metabolic tumor volume, standardized uptake value, the maximal distance between the largest lesion and another lesion, WHO performance status and age of 296 diffuse large B cell lymphoma patients. These data were used to predict progression after 2 years based on an existing logistic regression model. Using the simulated data, we applied cross-validation, bootstrapping and holdout (n = 100). We simulated new external datasets (n = 100, n = 200, n = 500) and simulated stage-specific external datasets (1), varied the cut-off for high-risk patients (2) and the false positive and false negative rates (3) and simulated a dataset with EARL2 characteristics (4). All internal and external simulations were repeated 100 times. Model performance was expressed as the cross-validated area under the curve (CV-AUC +/- SD) and calibration slope. Results The cross-validation (0.71 +/- 0.06) and holdout (0.70 +/- 0.07) resulted in comparable model performances, but the model had a higher uncertainty using a holdout set. Bootstrapping resulted in a CV-AUC of 0.67 +/- 0.02. The calibration slope was comparable for these internal validation approaches. Increasing the size of the test set resulted in more precise CV-AUC estimates and smaller SD for the calibration slope. For test datasets with different stages, the CV-AUC increased as Ann Arbor stages increased. As expected, changing the cut-off for high risk and false positive- and negative rates influenced the model performance, which is clearly shown by the low calibration slope. The EARL2 dataset resulted in similar model performance and precision, but calibration slope indicated overfitting. Conclusion In case of small datasets, it is not advisable to use a holdout or a very small external dataset with similar characteristics. A single small testing dataset suffers from a large uncertainty. Therefore, repeated CV using the full training dataset is preferred instead. Our simulations also demonstrated that it is important to consider the impact of differences in patient population between training and test data, which may ask for adjustment or stratification of relevant variables.
引用
收藏
页数:8
相关论文
共 24 条
  • [11] External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges
    Riley, Richard D.
    Ensor, Joie
    Snell, Kym I. E.
    Debray, Thomas P. A.
    Altman, Doug G.
    Moons, Karel G. M.
    Collins, Gary S.
    [J]. BMJ-BRITISH MEDICAL JOURNAL, 2016, 353
  • [12] Using ICD-9 diagnostic codes for external validation of topic models derived from primary care electronic medical record clinical text data
    Meaney, Christopher
    Escobar, Michael
    Stukel, Therese A.
    Austin, Peter C.
    Kalia, Sumeet
    Aliarzadeh, Babak
    Moineddin, Rahim
    Greiver, Michelle
    [J]. HEALTH INFORMATICS JOURNAL, 2023, 29 (01)
  • [13] Development and external validation study combining existing models and recent data into an up-to-date prediction model for evaluating kidneys from older deceased donors for transplantation
    Ramspek, Chava L.
    El Moumni, Mostafa
    Wali, Eelaha
    Heemskerk, Martin B. A.
    Pol, Robert A.
    Crop, Meindert J.
    Jansen, Nichon E.
    Hoitsma, Andries
    Dekker, Friedo W.
    van Diepen, M.
    Moers, Cyril
    [J]. KIDNEY INTERNATIONAL, 2021, 99 (06) : 1459 - 1469
  • [14] External validation and comparison of three prediction tools for risk of osteoporotic fractures using data from population based electronic health records: retrospective cohort study
    Dagan, Noa
    Cohen-Stavi, Chandra
    Leventer-Roberts, Maya
    Balicer, Ran D.
    [J]. BMJ-BRITISH MEDICAL JOURNAL, 2017, 356
  • [15] EXTERNAL VALIDATION OF THE RISK-PREDICTION MODEL FOR HEPATOCELLULAR CARCINOMA [HCC] FROM THE REVEAL-HCV STUDY USING DATA FROM THE US VETERANS AFFAIRS [VA] HEALTH SYSTEM
    Matsuda, T.
    McCombs, J.
    Lee, M-H.
    Tonnu-Mihara, I.
    L'italien, G.
    Saab, S.
    Hines, P.
    Yuan, Y.
    [J]. JOURNAL OF HEPATOLOGY, 2014, 60 (01) : S16 - S16
  • [16] Validation of an early vascular aging construct model for comprehensive cardiovascular risk assessment using external risk indicators for improved clinical utility: data from the EVasCu study
    Cavero-Redondo, Ivan
    Saz-Lara, Alicia
    Martinez-Garcia, Irene
    Otero-Luis, Iris
    Martinez-Rodrigo, Arturo
    [J]. CARDIOVASCULAR DIABETOLOGY, 2024, 23 (01)
  • [17] Validation of an early vascular aging construct model for comprehensive cardiovascular risk assessment using external risk indicators for improved clinical utility: data from the EVasCu study
    Iván Cavero-Redondo
    Alicia Saz-Lara
    Irene Martínez-García
    Iris Otero-Luis
    Arturo Martínez-Rodrigo
    [J]. Cardiovascular Diabetology, 23
  • [18] Outcome prediction in pediatric fever in neutropenia: Development of clinical decision rules and external validation of published rules based on data from the prospective multicenter SPOG 2015 FN definition study
    Santschi, Marina
    Ammann, Roland A.
    Agyeman, Philipp K. A.
    Ansari, Marc
    Bodmer, Nicole
    Brack, Eva
    Koenig, Christa
    [J]. PLOS ONE, 2023, 18 (08):
  • [19] Outcome prediction in pediatric fever in neutropenia: Development of clinical decision rules and external validation of published rules based on data from the prospective multicenter SPOG 2015 FN Definition Study
    Marina, Santschi
    Roland, Ammann A.
    Philipp, Agyeman K. A.
    Marc, Ansari
    Nicole, Bodmer
    Eva, Brack
    Christa, Koenig
    [J]. SWISS MEDICAL WEEKLY, 2023, 153 : 15S - 16S
  • [20] Using Iterative Pairwise External Validation to Contextualize Prediction Model Performance: A Use Case Predicting 1-Year Heart Failure Risk in Patients with Diabetes Across Five Data Sources
    Williams, Ross D.
    Reps, Jenna M.
    Kors, Jan A.
    Ryan, Patrick B.
    Steyerberg, Ewout
    Verhamme, Katia M.
    Rijnbeek, Peter R.
    [J]. DRUG SAFETY, 2022, 45 (05) : 563 - 570