共 2 条
Exploring the Stability of Differential Item Functioning Across Administrations and Critical Values Using the Rasch Separate Calibration t-test Method
被引:6
|作者:
Peabody, Michael R.
[1
]
Wind, Stefanie A.
[2
]
机构:
[1] Amer Board Family Med, Psychometr Dept, Lexington, KY 40511 USA
[2] Univ Alabama, Educ Res Dept, Tuscaloosa, AL USA
关键词:
Differential item functioning;
Rasch model;
certification;
multiple administrations;
MANTEL-HAENSZEL;
LOGISTIC-REGRESSION;
RESPONSE THEORY;
SIZE;
BIAS;
D O I:
10.1080/15366367.2018.1533782
中图分类号:
C [社会科学总论];
学科分类号:
03 ;
0303 ;
摘要:
Differential Item Functioning (DIF) detection procedures provide validity evidence for proposed interpretations of test scores that can help researchers and practitioners ensure that test scores are free from potential bias, and that individual items do not create an advantage for any subgroup of examinees over another. In this study, we use the Rasch separate calibration t-test method to examine the effects of different levels of contrast at varying levels of statistical significance on the flagging of items across multiple examination administrations. We assert that if DIF is a stable trait of an item, it should be sample-independent and detected each time it is administered. We examine the consistency of different alpha levels and critical values in identifying DIF for the same items across multiple administrations. Our results suggest that, using our most lenient criteria, approximately 40% of items on any exam administration may be flagged for DIF, but this drops to 20% when considering items across two administrations, and 12% across three administrations. Testing organizations can use the methods illustrated here to set their own thresholds for DIF, which may be useful if organizations need to estimate the time and cost associated with having independent reviewers examine items.
引用
收藏
页码:78 / 92
页数:15
相关论文