A simulation study of the number of events per variable in logistic regression analysis

被引:6009
|
作者
Peduzzi, P
Concato, J
Kemper, E
Holford, TR
Feinstein, AR
机构
[1] YALE UNIV,SCH MED,DEPT MED,CLIN EPIDEMIOL UNIT,NEW HAVEN,CT 06510
[2] YALE UNIV,SCH MED,DEPT EPIDEMIOL & PUBL HLTH,NEW HAVEN,CT 06510
[3] VET ADM MED CTR,MED SERV,W HAVEN,CT 06516
关键词
Monte Carlo; bias; precision; significance testing;
D O I
10.1016/S0895-4356(96)00236-3
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
We performed a Monte Carlo study to evaluate the effect of the number of events per variable (EPV) analyzed in logistic regression analysis. The simulations were based on data from a cardiac trial of 673 patients in which 252 deaths occurred and seven variables were cogent predictors of mortality; the number of events per predictive variable was (252/7 =) 36 for the full sample. For the simulations, at values of EPV = 2, 5, 10, 15, 20, and 25, we randomly generated 500 samples of the 673 patients, chosen with replacement, according to a logistic model derived from the full sample. Simulation results for the regression coefficients for each variable in each group of 500 samples were compared for bias, precision, and significance testing against the results of the model fitted to the original sample. For EPV values of 10 or greater, no major problems occurred. For EPV values less than 10, however, the regression coefficients were biased in both positive and negative directions; the large sample variance estimates from the logistic model both overestimated and underestimated the sample variance of the regression coefficients; the 90% confidence limits about the estimated values did not have proper coverage; the Wald statistic was conservative under the null hypothesis; and paradoxical associations (significance in the wrong direction) were increased. Although other factors (such as the total number of events, or sample size) may influence the validity of the logistic model, our findings indicate that low EPV can lead to major problems. Copyright (C) 1996 Elsevier Science Inc.
引用
收藏
页码:1373 / 1379
页数:7
相关论文
共 50 条
  • [1] Logistic regression modeling and the number of events per variable: selection bias dominates
    Steyerberg, Ewout W.
    Schemper, Michael
    Harrell, Frank E.
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2011, 64 (12) : 1464 - 1465
  • [2] Importance of events per independent variable in logistic regression analysis Reply
    Aman, Jurjan
    Amerongen, Geerten P. van Nieuw
    Groeneveld, A. B. Johan
    CRITICAL CARE MEDICINE, 2012, 40 (04) : 1392 - 1393
  • [3] Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure
    Courvoisier, Delphine S.
    Combescure, Christophe
    Agoritsas, Thomas
    Gayet-Ageron, Angele
    Perneger, Thomas V.
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2011, 64 (09) : 993 - 1000
  • [4] Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure
    Courvoisier, Delphine S.
    Combescure, Christophe
    Agoritsas, Thomas
    Gayet-Ageron, Angele
    Perneger, Thomas V.
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2011, 64 (12) : 1463 - 1464
  • [5] No rationale for 1 variable per 10 events criterion for binary logistic regression analysis
    Maarten van Smeden
    Joris A. H. de Groot
    Karel G. M. Moons
    Gary S. Collins
    Douglas G. Altman
    Marinus J. C. Eijkemans
    Johannes B. Reitsma
    BMC Medical Research Methodology, 16
  • [6] No rationale for 1 variable per 10 events criterion for binary logistic regression analysis
    van Smeden, Maarten
    de Groot, Joris A. H.
    Moons, Karel G. M.
    Collins, Gary S.
    Altman, Douglas G.
    Eijkemans, Marinus J. C.
    Reitsma, Johannes B.
    BMC MEDICAL RESEARCH METHODOLOGY, 2016, 16 : 1 - 12
  • [7] Relaxing the rule of ten events per variable in logistic and Cox regression
    Vittinghoff, Eric
    McCulloch, Charles E.
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2007, 165 (06) : 710 - 718
  • [8] The number of independent variables and events for multiple logistic regression analysis
    Igase, Michiya
    JOURNAL OF THE NEUROLOGICAL SCIENCES, 2013, 334 (1-2) : 198 - 198
  • [9] A simulation study of sample size demonstrated the importance of the number of events per variable to develop prediction models in clustered data
    Wynants, L.
    Bouwmeester, W.
    Moons, K. G. M.
    Moerbeek, M.
    Timmerman, D.
    Van Huffel, S.
    Van Calster, B.
    Vergouwe, Y.
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2015, 68 (12) : 1406 - 1414
  • [10] The number of subjects per variable required in linear regression analyses
    Austin, Peter C.
    Steyerberg, Ewout W.
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2015, 68 (06) : 627 - 636