Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes

被引:490
|
作者
Riley, Richard D. [1 ]
Snell, Kym I. E. [1 ]
Ensor, Joie [1 ]
Burke, Danielle L. [1 ]
Harrell, Frank E., Jr. [2 ]
Moons, Karel G. M. [3 ]
Collins, Gary S. [4 ]
机构
[1] Keele Univ, Res Inst Primary Care & Hlth Sci, Ctr Prognosis Res, Keele ST5 5BG, Staffs, England
[2] Vanderbilt Univ, Sch Med, Dept Biostat, Nashville, TN 37212 USA
[3] Univ Med Ctr Utrecht, Julius Ctr Hlth Sci & Primary Care, Utrecht, Netherlands
[4] Univ Oxford, Ctr Stat Med, Nuffield Dept Orthopaed Rheumatol & Musculoskelet, Oxford, England
关键词
binary and time-to-event outcomes; logistic and Cox regression; multivariable prediction model; pseudo R-squared; sample size; shrinkage; PROPORTIONAL-HAZARDS; CARDIOVASCULAR RISK; PROGNOSTIC INDEX; REGRESSION; SIMULATION; LIKELIHOOD; NUMBER; PROBABILITY; VALIDATION; DERIVATION;
D O I
10.1002/sim.7992
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
When designing a study to develop a new prediction model with binary or time-to-event outcomes, researchers should ensure their sample size is adequate in terms of the number of participants (n) and outcome events (E) relative to the number of predictor parameters (p) considered for inclusion. We propose that the minimum values of n and E (and subsequently the minimum number of events per predictor parameter, EPP) should be calculated to meet the following three criteria: (i) small optimism in predictor effect estimates as defined by a global shrinkage factor of >= 0.9, (ii) small absolute difference of <= 0.05 in the model's apparent and adjusted Nagelkerke's R-2, and (iii) precise estimation of the overall risk in the population. Criteria (i) and (ii) aim to reduce overfitting conditional on a chosen p, and require prespecification of the model's anticipated Cox-Snell R-2, which we show can be obtained from previous studies. The values of n and E that meet all three criteria provides the minimum sample size required for model development. Upon application of our approach, a new diagnostic model for Chagas disease requires an EPP of at least 4.8 and a new prognostic model for recurrent venous thromboembolism requires an EPP of at least 23. This reinforces why rules of thumb (eg, 10 EPP) should be avoided. Researchers might additionally ensure the sample size gives precise estimates of key predictor effects; this is especially important when key categorical predictors have few events in some categories, as this may substantially increase the numbers required.
引用
收藏
页码:1276 / 1296
页数:21
相关论文
共 50 条
  • [1] Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes
    Riley, Richard D.
    Snell, Kym I. E.
    Ensor, Joie
    Burke, Danielle L.
    Harrell, Frank E., Jr.
    Moons, Karel G. M.
    Collins, Gary S.
    [J]. STATISTICS IN MEDICINE, 2019, 38 (07) : 1262 - 1275
  • [2] Minimum sample size for developing a multivariable prediction model: Part II-binary and time-to-event outcomes by Riley RD, Snell KI, Ensor J, et al. (vol 38, pg 1276, 2019)
    Riley, Richard D.
    [J]. STATISTICS IN MEDICINE, 2019, 38 (30) : 5672 - 5672
  • [3] Minimum sample size calculations for external validation of a clinical prediction model with a time-to-event outcome
    Riley, Richard D.
    Collins, Gary S.
    Ensor, Joie
    Archer, Lucinda
    Booth, Sarah
    Mozumder, Sarwar, I
    Rutherford, Mark J.
    van Smeden, Maarten
    Lambert, Paul C.
    Snell, Kym I. E.
    [J]. STATISTICS IN MEDICINE, 2022, 41 (07) : 1280 - 1295
  • [4] Minimum sample size for developing a multivariable prediction model using multinomial logistic regression
    Pate, Alexander
    Riley, Richard D.
    Collins, Gary S.
    van Smeden, Maarten
    Van Calster, Ben
    Ensor, Joie
    Martin, Glen P.
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2023, 32 (03) : 555 - 571
  • [5] Sample size adjustment designs with time-to-event outcomes: A caution
    Freidlin, Boris
    Korn, Edward L.
    [J]. CLINICAL TRIALS, 2017, 14 (06) : 597 - 604
  • [6] A framework for meta-analysis of prediction model studies with binary and time-to-event outcomes
    Debray, Thomas P. A.
    Damen, Johanna A. A. G.
    Riley, Richard D.
    Snell, Kym
    Reitsma, Johannes B.
    Hooft, Lotty
    Collins, Gary S.
    Moons, Karel G. M.
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2019, 28 (09) : 2768 - 2786
  • [7] Discrimination-based sample size calculations for multivariable prognostic models for time-to-event data
    Jinks, Rachel C.
    Royston, Patrick
    Parmar, Mahesh K. B.
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2015, 15
  • [8] A class of two-sample nonparametric statistics for binary and time-to-event outcomes
    Bofill Roig, Marta
    Gomez Melis, Guadalupe
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2022, 31 (02) : 225 - 239
  • [9] Discrimination-based sample size calculations for multivariable prognostic models for time-to-event data
    Rachel C. Jinks
    Patrick Royston
    Mahesh KB Parmar
    [J]. BMC Medical Research Methodology, 15
  • [10] Aligning sample size calculations with estimands in clinical trials with time-to-event outcomes
    Fang, Yixin
    Jin, Man
    Wu, Chengqing
    [J]. STATISTICS AND ITS INTERFACE, 2024, 17 (01) : 63 - 68