Fitting logistic regression models with contaminated case-control data

被引:1
|
作者
Cheng, K. F. [1 ]
Chen, L. C.
机构
[1] Natl Cent Univ, Grad Inst Stat, Chungli, Taiwan
[2] Tamkang Univ, Dept Stat, Taipei, Taiwan
关键词
case-control data; contamination; logistic regression; maximum likelihood; misclassification;
D O I
10.1016/j.jspi.2005.07.009
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Errors in measurement frequently occur in observing responses. If case-control data are based on certain reported responses, which may not be the true responses, then we have contaminated case-control data. In this paper, we first show that the ordinary logistic regression analysis based on contaminated case-control data can lead to very serious biased conclusions. This can be concluded from the results of a theoretical argument, one example, and two simulation studies. We next derive the semiparametric maximum likelihood estimate (MLE) of the risk parameter of a logistic regression model when there is a validation subsample. The asymptotic normality of the semiparametric MLE will be shown along with consistent estimate of asymptotic variance. Our example and two simulation studies show these estimates to have reasonable performance under finite sample situations. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:4147 / 4160
页数:14
相关论文
共 50 条