Five myths about variable selection

被引:352
|
作者
Heinze, Georg [1 ]
Dunkler, Daniela [1 ]
机构
[1] Med Univ Vienna, Ctr Med Stat Informat & Intelligent Syst, Sect Clin Biometr, Spitalgasse 23, A-1090 Vienna, Austria
关键词
association; explanatory models; multivariable modeling; prediction; statistical analysis; LIVER-TRANSPLANTATION; SURVIVAL; RECIPIENTS; EVENTS; MODEL;
D O I
10.1111/tri.12895
中图分类号
R61 [外科手术学];
学科分类号
摘要
Multivariable regression models are often used in transplantation research to identify or to confirm baseline variables which have an independent association, causally or only evidenced by statistical correlation, with transplantation outcome. Although sound theory is lacking, variable selection is a popular statistical method which seemingly reduces the complexity of such models. However, in fact, variable selection often complicates analysis as it invalidates common tools of statistical inference such as P-values and confidence intervals. This is a particular problem in transplantation research where sample sizes are often only small to moderate. Furthermore, variable selection requires computer-intensive stability investigations and a particularly cautious interpretation of results. We discuss how five common misconceptions often lead to inappropriate application of variable selection. We emphasize that variable selection and all problems related with it can often be avoided by the use of expert knowledge.
引用
收藏
页码:6 / 10
页数:5
相关论文
共 50 条